当前位置: 开发笔记 > 编程语言 > 正文

数据总线地址总线和什么总线_以高性能方式和总线因子剥离空的XmlElement

作者：DD906114329 | 来源：互联网 | 2023-06-02 12:05

数据总线地址总线和什么总线WehaveasystemthatusestemplatestocreateXML.Somethinglike:我们有一个使用模板创建XML的系统。就像是

数据总线地址总线和什么总线

We have a system that uses templates to create XML. Something like:

我们有一个使用模板创建XML的系统。就像是&＃xff1a;

{CUSTOMTEMPLATETHING1}
{CUSTOMTEMPLATETHING2}
<根> {CUSTOMTEMPLATETHING1} {CUSTOMTEMPLATETHING2}

And the result might be:

结果可能是&＃xff1a;

text content

<根> 文本内容

Notice that has "" as it&＃39;s content. For a string that&＃39;s OK, but for a DateTime or Decimal, not so much. In those cases (and arguably in strings when String.IsNullOrEmpty is your primary semantic need) it&＃39;d be preferable to the XmlSerializer and any other consumers to have those elements stripped out.

请注意&＃xff0c;的内容为“”。对于一个可以的字符串&＃xff0c;但对于DateTime或Decimal&＃xff0c;则不是很多。在这些情况下(当String.IsNullOrEmpty是您的主要语义需求时&＃xff0c;可以说是在字符串中)&＃xff0c;最好是XmlSerializer和其他任何使用者都应删除这些元素。

So, we created what we called the Rectifier. You can feel free to ponder the root or roots of the word. The early versions of the Rectifier used an uber-regular expression to strip out these tags from the source string. This system returns a full XML Document string, not an XmlReader or IXPathNavigable.

因此&＃xff0c;我们创建了所谓的整流器。您可以随意考虑单词的一个或多个词根。整流器的早期版本使用超级正则表达式从源字符串中去除这些标签。该系统返回完整的XML文档字符串&＃xff0c;而不是XmlReader或IXPathNavigable。

I heard a cool quote yesterday at the Portland NerdDinner while we were planning the CodeCamp.

昨天我们在计划CodeCamp时&＃xff0c;在Portland NerdDinner听到了一个很酷的报价。

"So you&＃39;ve got a problem, and you&＃39;ve decided to solve it with Regular Expressions. Now you&＃39;ve got two problems."
“因此&＃xff0c;您遇到了一个问题&＃xff0c;并且决定使用正则表达式解决它。现在您遇到了两个问题。”

Since the size of the documents we passed through this system were between 10k and 100k the performance of the RegEx, especially when it&＃39;s compiled and cached was fine. Didn&＃39;t give it a thought for years. It worked and it worked well. It looked like this:

由于我们通过此系统传递的文档大小在10k到100k之间&＃xff0c;因此RegEx的性能特别是在编译和缓存RegEx的情况下尤其如此。多年没有思考。它运作良好&＃xff0c;而且运作良好。它看起来像这样&＃xff1a;

private static Regex regex &＃61; new Regex(&＃64;"\<[\w-_.: ]*\>\<\!\[CDATA\[\]\]\>\|\<[\w-_.: ]*\>\|<[\w-_.: ]*/\>|\<[\w-_.: ]*[/]&＃43;\>|\<[\w-_.: ]*[\s]xmlns[:\w]*&＃61;""[\w-/_.: ]*""\>\|<[\w-_.: ]*[\s]xmlns[:\w]*&＃61;""[\w-/_.: ]*""[\s]*/\>|\<[\w-_.: ]*[\s]xmlns[:\w]*&＃61;""[\w-/_.: ]*""\>\<\!\[CDATA\[\]\]\>\",RegexOptions.Compiled);
私有静态Regex regex &＃61; new Regex(&＃64;“ \ <[\ w-_ .:] * \> \ <\&＃xff01;\ [CDATA \ [\] \] \> \ | \ <[\ w-_ .:] * \> \ | <[\ w-_ .:] * / \> | \ <[\ w -_ .:] * [/] &＃43; \> | \ <[\ w-_ .:] * [\ s] xmlns [&＃xff1a;\ w] * &＃61;“” [\ w-/ _ .:] *“” \> \ | <[\ w-_ .:] * [\ s] xmlns [&＃xff1a;\ w] * &＃61;“” [\ w-/ _ .:] *“” [[s] * / \> | \ <[\ w-_ .:] * [\ s] xmlns [&＃xff1a;\ w] * &＃61;“” [\ w-/ _ .:] *“” \ > \ <\&＃xff01;\ [CDATA \ [\] \] \> \ “&＃xff0c;RegexOptions.Compiled);

Stuff like this has what I call a "High Bus Factor." That means if the developer who wrote it is hit by a bus, you&＃39;re screwed. It&＃39;s nice to create a solution that anyone can sit down and start working on and this isn&＃39;t one of them.

这样的东西具有我所说的“高总线系数” 。 这意味着如果编写它的开发人员被公交车撞了&＃xff0c;那您就被搞砸了。 创建一个任何人都可以坐下来并开始工作的解决方案真是太好了&＃xff0c;这不是其中之一。

Then, lately some folks started pushing larger amounts of data through this system, in excess of 1.5 Megs and this Regular Expression started to 4, 8, 12 seconds to finish on this giant XML strings. We&＃39;d hit the other side of the knee of the exponential performance curve that you see with string processing like this.

然后&＃xff0c;最近有些人开始通过该系统推送超过1.5 Megs的大量数据&＃xff0c;并且此正则表达式的开始时间为4、8、12秒&＃xff0c;以完成此巨型XML字符串的处理。通过这样的字符串处理&＃xff0c;我们可以看到指数性能曲线的另一端。

So, Patrick had the idea to use XmlReaders and create an XmlRectifyingReader or XmlPeekingReader. Basically a fake reader, that had a reader internally and would "peek" ahead to see if we should skip empty elements. It&＃39;s a complicated problem when you consider nesting, CDATA sections, attributes, namespaces, etc. But, because XmlReaders are forward only, you have to hold a lot of state as you move forward, since there&＃39;s no way to back up. We gave up on this idea, since we want to fix this in a day, but it remains, in our opinion, a cool idea we&＃39;d like to try. We wanted to do something like: xs.Deserialize(new XmlRectifyingReader(new StringReader(inputString))). But, the real issue was performance - over elegance.

因此&＃xff0c; Patrick想到了使用XmlReaders并创建XmlRectifyingReader或XmlPeekingReader的想法。基本上是伪造的阅读器&＃xff0c;内部具有阅读器&＃xff0c;并且会“向前看”看是否应该跳过空元素。当考虑嵌套&＃xff0c;CDATA节&＃xff0c;属性&＃xff0c;名称空间等时&＃xff0c;这是一个复杂的问题。但是&＃xff0c;由于XmlReaders仅是转发的&＃xff0c;因此在前进时必须保持很多状态&＃xff0c;因为无法进行备份。因为我们想在一天内解决这个问题&＃xff0c;所以我们放弃了这个想法&＃xff0c;但是在我们看来&＃xff0c;它仍然是我们想要尝试的一个很棒的想法。我们想要做类似的事情&＃xff1a;xs.Deserialize(new XmlRectifyingReader(new StringReader(inputString)))。但是&＃xff0c;真正的问题是性能-而不是优雅。

Then we figured we&＃39;d do an XmlReader/XmlWriter thing like:

然后我们认为我们可以做一个XmlReader / XmlWriter之类的事情&＃xff1a;

using(StringWriter strw &＃61; new StringWriter())

使用(StringWriter strw &＃61; new StringWriter())

{

XmlWriter writer &＃61; new XmlTextWriter(strw);

XmlWriter writer &＃61;新的XmlTextWriter(strw);

XmlReader reader &＃61; new XmlTextReader(new StringReader(input));

XmlReader reader &＃61; new XmlTextReader( new StringReader(input));

reader.Read();

RectifyXmlInternal(reader, writer); //This is US

RectifyXmlInternal(reader&＃xff0c;writer); //这是我们

reader.Close();

writer.Close();

return strw.ToString();

返回strw.ToString();

}

private class Attribute

私有类属性

{

public Attribute(string l, string n, string v, string p)

公共属性(字符串l&＃xff0c;字符串n&＃xff0c;字符串v&＃xff0c;字符串p)

{

LocalName &＃61; l;

Namespace &＃61; n;

命名空间&＃61; n;

Value &＃61; v;

值&＃61; v;

Prefix &＃61; p;

前缀&＃61; p;

}

public string LocalName &＃61; string.Empty;

公共字符串LocalName &＃61;字符串.Empty;

public string Namespace &＃61; string.Empty;

公共字符串命名空间&＃61; string .Empty;

public string Value &＃61; string.Empty;

公共字符串值&＃61;字符串.Empty;

public string Prefix &＃61; string.Empty;

公共字符串前缀&＃61;字符串.Empty;

}

internal static void RectifyXmlInternal(XmlReader reader, XmlWriter writer)

内部静态无效RectifyXmlInternal(XmlReader reader&＃xff0c;XmlWriter writer)

{

int depth &＃61; reader.Depth;

int深度&＃61; reader.Depth;

while (true && !reader.EOF)

while ( true &&&＃xff01;reader.EOF)

{

switch ( reader.NodeType )

开关(reader.NodeType)

{

case XmlNodeType.Text:

大小写XmlNodeType.Text&＃xff1a;

writer.WriteString( reader.Value );

writer.WriteString(reader.Value);

break;

休息;

case XmlNodeType.Whitespace:

大小写XmlNodeType.Whitespace&＃xff1a;

case XmlNodeType.SignificantWhitespace:

大小写XmlNodeType.SignificantWhitespace&＃xff1a;

writer.WriteWhitespace(reader.Value);

break;

休息;

case XmlNodeType.EntityReference:

案例XmlNodeType.EntityReference&＃xff1a;

writer.WriteEntityRef(reader.Name);

break;

休息;

case XmlNodeType.XmlDeclaration:

大小写XmlNodeType.XmlDeclaration&＃xff1a;

case XmlNodeType.ProcessingInstruction:

案例XmlNodeType.ProcessingInstruction&＃xff1a;

writer.WriteProcessingInstruction( reader.Name, reader.Value );

writer.WriteProcessingInstruction(reader.Name&＃xff0c;reader.Value);

break;

休息;

case XmlNodeType.DocumentType:

大小写XmlNodeType.DocumentType&＃xff1a;

writer.WriteDocType( reader.Name,

writer.WriteDocType(reader.Name&＃xff0c;

reader.GetAttribute( "PUBLIC" ), reader.GetAttribute( "SYSTEM" ),

reader.GetAttribute(“ PUBLIC”)&＃xff0c;reader.GetAttribute(“ SYSTEM”)&＃xff0c;

reader.Value );

reader.Value);

break;

休息;

case XmlNodeType.Comment:

案例XmlNodeType.Comment&＃xff1a;

writer.WriteComment( reader.Value );

writer.WriteComment(reader.Value);

break;

休息;

case XmlNodeType.EndElement:

案例XmlNodeType.EndElement&＃xff1a;

if(depth > reader.Depth)

如果(深度>读者深度)

return;

回报;

break;

休息;

}

if(reader.IsEmptyElement || reader.EOF) return;

如果(reader.IsEmptyElement || reader.EOF)返回&＃xff1b;

else if(reader.IsStartElement())

否则如果(reader.IsStartElement())

{

string name &＃61; reader.Name;

字符串名称&＃61; reader.Name;

string localName &＃61; reader.LocalName;

字符串localName &＃61; reader.LocalName;

string prefix &＃61; reader.Prefix;

字符串前缀&＃61; reader.Prefix;

string uri &＃61; reader.NamespaceURI;

字符串uri &＃61; reader.NamespaceURI;

ArrayList attributes &＃61; null;

ArrayList属性&＃61; null ;

if(reader.HasAttributes)

如果(reader.HasAttributes)

{

attributes &＃61; new ArrayList();

属性&＃61;新的ArrayList();

while(reader.MoveToNextAttribute() )

同时(reader.MoveToNextAttribute())

attributes.Add(new Attribute(reader.LocalName,reader.NamespaceURI,reader.Value,reader.Prefix));

attribute.Add( new Attribute(reader.LocalName&＃xff0c;reader.NamespaceURI&＃xff0c;reader.Value&＃xff0c;reader.Prefix));

}

bool CData &＃61; false;

布尔CData &＃61; false ;

reader.Read();

if(reader.NodeType &＃61;&＃61; XmlNodeType.CDATA)

如果(reader.NodeType &＃61;&＃61; XmlNodeType.CDATA)

{

CData &＃61; true;

CData &＃61; true ;

}

if(reader.NodeType &＃61;&＃61; XmlNodeType.CDATA && reader.Value.Length &＃61;&＃61; 0)

如果(reader.NodeType &＃61;&＃61; XmlNodeType.CDATA && reader.Value.Length &＃61;&＃61; 0)

{

reader.Read();

}

if(reader.NodeType &＃61;&＃61; XmlNodeType.EndElement && reader.Name.Equals(name))

如果(reader.NodeType &＃61;&＃61; XmlNodeType.EndElement && reader.Name.Equals(name))

{

reader.Read();

if (reader.Depth

如果(reader.Depth <深度)

return;

回报;

else

其他

continue;

继续;

}

writer.WriteStartElement( prefix, localName, uri);

writer.WriteStartElement(前缀&＃xff0c;localName&＃xff0c;uri);

if (attributes !&＃61; null)

如果(属性&＃xff01;&＃61; null )

{

foreach(Attribute a in attributes)

的foreach(在属性属性一个)

writer.WriteAttributeString(a.Prefix,a.LocalName,a.Namespace,a.Value);

writer.WriteAttributeString(a.Prefix&＃xff0c;a.LocalName&＃xff0c;a.Namespace&＃xff0c;a.Value);

}

if(reader.IsStartElement())

如果(reader.IsStartElement())

{

if(reader.Depth > depth)

如果(读者深度>深度)

RectifyXmlInternal(reader, writer);

RectifyXmlInternal(reader&＃xff0c;writer);

else

其他

continue;

继续;

}

else

其他

{

if (CData)

如果(CData)

writer.WriteCData(reader.Value);

else

其他

writer.WriteString(reader.Value);

reader.Read();

}

writer.WriteFullEndElement();

reader.Read();

}

The resulting "rectified" or empty-element stripped XML is byte for byte identical to the XML created by the original Regular Expression, so we succeeded in keeping compatiblity. The performance on small strings of XML less than 100 bytes is about 2x slower, because of the all overhead. However, as the size of the XML approaches middle part of the bell curve that repsents the typical size (10k of 100k) this technique overtakes RegularExpressions in a big way. Initial tests are between 7x and 10x faster in our typical scenario. When the XML gets to 1.5 megs this technique can process it in sub-second times. So, the Regular Expression behaves in an O(c^n) way, and this technique (scary as it is) behaves more O(n log(n)).

结果得到的“已纠正”或空元素剥离的XML逐字节与原始正则表达式创建的XML相同&＃xff0c;因此我们成功地保持了兼容性。小于100字节的XML小字符串的性能由于所有开销而降低了约2倍。但是&＃xff0c;由于XML的大小接近代表典型大小的钟形曲线的中间部分(100k中的10k)&＃xff0c;因此该技术将大大取代RegularExpression。在我们的典型情况下&＃xff0c;初始测试的速度要快7到10倍。当XML达到1.5兆时&＃xff0c;该技术可以在不到一秒的时间内处理它。因此&＃xff0c;正则表达式的行为为O(c ^ n)&＃xff0c;而这种技术(实际上是吓人的)表现出更多的O(n log(n))。

This lesson taught me that manipulating XML as if it were a string is often easy and quick to develop, but manipulating the infoset with really lightweight APIs like the XmlReader will almost always make life easier.

这节课告诉我&＃xff0c;将XML当作字符串来进行处理通常很容易且快速地进行开发&＃xff0c;但是使用真正轻量级的API(如XmlReader)来处理信息集将几乎总是使生活变得更轻松。

I&＃39;d be interested in hearing Oleg or Kzu&＃39;s opinions on how to make this more elegant and performant, and if it&＃39;s even worth the hassle. Our dream of an XmlPeekingReader or XmlRectifyingReader to do this all in one pass remains...

我很想听听奥列格(Oleg)或库祖(Kzu)关于如何使其更优雅&＃xff0c;更高效的观点&＃xff0c;以及是否值得为此烦恼。我们的梦想仍然是XmlPeekingReader或XmlRectifyingReader一次完成所有任务……

翻译自: https://www.hanselman.com/blog/stripping-out-empty-xmlelements-in-a-performant-way-and-the-bus-factor

数据总线地址总线和什么总线

推荐阅读

install
基于PgpoolII的PostgreSQL集群安装与配置教程

本文介绍了基于PgpoolII的PostgreSQL集群的安装与配置教程。Pgpool-II是一个位于PostgreSQL服务器和PostgreSQL数据库客户端之间的中间件，提供了连接池、复制、负载均衡、缓存、看门狗、限制链接等功能，可以用于搭建高可用的PostgreSQL集群。文章详细介绍了通过yum安装Pgpool-II的步骤，并提供了相关的官方参考地址。 ... [详细]

蜡笔小新 2023-12-14 19:10:25
hash
Java工具类库Hutool介绍及功能概述

本文介绍了Java工具类库Hutool，该工具包封装了对文件、流、加密解密、转码、正则、线程、XML等JDK方法的封装，并提供了各种Util工具类。同时，还介绍了Hutool的组件，包括动态代理、布隆过滤、缓存、定时任务等功能。该工具包可以简化Java代码，提高开发效率。 ... [详细]

蜡笔小新 2023-12-14 14:29:36
plugins
Android Studio Bumblebee | 2021.1.1（大黄蜂版本使用介绍）

本文介绍了Android Studio Bumblebee | 2021.1.1（大黄蜂版本）的使用方法和相关知识，包括Gradle的介绍、设备管理器的配置、无线调试、新版本问题等内容。同时还提供了更新版本的下载地址和启动页面截图。 ... [详细]

蜡笔小新 2023-12-14 10:34:15
list
计算机存储系统的层次结构及其优势

本文介绍了计算机存储系统的层次结构，包括高速缓存、主存储器和辅助存储器三个层次。通过分层存储数据可以提高程序的执行效率。计算机存储系统的层次结构将各种不同存储容量、存取速度和价格的存储器有机组合成整体，形成可寻址存储空间比主存储器空间大得多的存储整体。由于辅助存储器容量大、价格低，使得整体存储系统的平均价格降低。同时，高速缓存的存取速度可以和CPU的工作速度相匹配，进一步提高程序执行效率。 ... [详细]

蜡笔小新 2023-12-13 17:32:41
list
拥抱Android Design Support Library新变化（导航视图、悬浮ActionBar）

转载请注明明桑AndroidAndroid5.0Loollipop作为Android最重要的版本之一，为我们带来了全新的界面风格和设计语言。看起来很受欢迎࿰ ... [详细]

蜡笔小新 2023-12-13 16:11:00
list
一句话解决高并发的核心原则

本文介绍了解决高并发的核心原则，即将用户访问请求尽量往前推，避免访问CDN、静态服务器、动态服务器、数据库和存储，从而实现高性能、高并发、高可扩展的网站架构。同时提到了Google的成功案例，以及适用于千万级别PV站和亿级PV网站的架构层次。 ... [详细]

蜡笔小新 2023-12-12 10:56:24
dll
全面介绍Windows内存管理机制及C++内存分配实例（四）：内存映射文件

本文旨在全面介绍Windows内存管理机制及C++内存分配实例中的内存映射文件。通过对内存映射文件的使用场合和与虚拟内存的区别进行解析，帮助读者更好地理解操作系统的内存管理机制。同时，本文还提供了相关章节的链接，方便读者深入学习Windows内存管理及C++内存分配实例的其他内容。 ... [详细]

蜡笔小新 2023-12-10 18:30:17
list
CSS3选择器的使用方法详解，提高Web开发效率和精准度

本文详细介绍了CSS3新增的选择器方法，包括属性选择器的使用。通过CSS3选择器，可以提高Web开发的效率和精准度，使得查找元素更加方便和快捷。同时，本文还对属性选择器的各种用法进行了详细解释，并给出了相应的代码示例。通过学习本文，读者可以更好地掌握CSS3选择器的使用方法，提升自己的Web开发能力。 ... [详细]

蜡笔小新 2023-12-14 14:37:52
list
SpringBoot yml 配置多配置文件,开发环境,生产环境配置文件分开

原文地址:https:www.cnblogs.combaoyipSpringBoot_YML.html1.在springboot中，有两种配置文件，一种 ... [详细]

蜡笔小新 2023-12-14 12:39:13
select
MySQL显示SQL语句执行时间的实例详解

本文详细介绍了如何使用MySQL来显示SQL语句的执行时间，并通过MySQL Query Profiler获取CPU和内存使用量以及系统锁和表锁的时间。同时介绍了效能分析的三种方法：瓶颈分析、工作负载分析和基于比率的分析。 ... [详细]

蜡笔小新 2023-12-12 16:16:42
list
Java中包装类的设计原因以及操作方法

本文主要介绍了Java中设计包装类的原因以及操作方法。在Java中，除了对象类型，还有八大基本类型，为了将基本类型转换成对象，Java引入了包装类。文章通过介绍包装类的定义和实现，解答了为什么需要包装类的问题，并提供了简单易用的操作方法。通过本文的学习，读者可以更好地理解和应用Java中的包装类。 ... [详细]

蜡笔小新 2023-12-12 15:48:10
list
Express App如何提供不需要的静态文件？

本文介绍了如何使用Express App提供静态文件，同时提到了一些不需要使用的文件，如package.json和/.ssh/known_hosts，并解释了为什么app.get('*')无法捕获所有请求以及为什么app.use(express.static(__dirname))可能会提供不需要的文件。 ... [详细]

蜡笔小新 2023-12-12 14:38:07
express
java boolean 大小_java boolean 大小

先看官方文档TheJavaTutorialshavebeenwrittenforJDK8.Examplesandpracticesdescribedinthispagedontta ... [详细]

蜡笔小新 2023-12-12 13:36:56
email
Android日历提醒软件开源项目分享及使用教程

本文介绍了一款名为Android日历提醒软件的开源项目，作者分享了该项目的代码和使用教程，并提供了GitHub项目地址。文章详细介绍了该软件的主界面风格、日程信息的分类查看功能，以及添加日程提醒和查看详情的界面。同时，作者还提醒了读者在使用过程中可能遇到的Android6.0权限问题，并提供了解决方法。 ... [详细]

蜡笔小新 2023-12-10 19:01:03
hash
Android开发优化之软引用与弱引用的应用

本文介绍了在Android开发中使用软引用和弱引用的应用。如果一个对象只具有软引用，那么只有在内存不够的情况下才会被回收，可以用来实现内存敏感的高速缓存；而如果一个对象只具有弱引用，不管内存是否足够，都会被垃圾回收器回收。软引用和弱引用还可以与引用队列联合使用，当被引用的对象被回收时，会将引用加入到关联的引用队列中。软引用和弱引用的根本区别在于生命周期的长短，弱引用的对象可能随时被回收，而软引用的对象只有在内存不够时才会被回收。 ... [详细]

蜡笔小新 2023-12-10 16:33:12

DD906114329

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章