当前位置: 开发笔记 > 编程语言 > 正文

哪些是HTML和XML的特殊字符?-WhicharetheHTML,andXML,specialcharacters?

作者：蔡晓楠 | 来源：互联网 | 2023-05-18 00:21

WhatarethespecialreservedcharacterentitiesinHTMLandinXML?HTML和XML中有哪些特殊的保留字符实体?Theinf

What are the special reserved character entities in HTML and in XML?

HTML和XML中有哪些特殊的保留字符实体?

The information that i have says:

我得到的信息是:

HTML:

HTML:

& (replace with &)
与,&(替换)
< (replace with <)
<(替换& lt;)
> (replace with >)
与祝辞>(替换)
" (replace with ")
”(使用“替换)
' (replace with ')
”(事情就让它替换为,)

XML:

XML:

< (replace with <)
<(替换& lt;)
> (replace with >)
与祝辞>(替换)
& (replace with &)
与,&(替换)
' (replace with ')
”(事情就让它替换为,)
" (replace with ")
”(使用“替换)

But i cannot find documentation on either of these.

但我找不到任何文件。

The W3C does mention, in Extensible Markup Language (XML) 1.0 (Fifth Edition), certain predefined entity references. But it says that these entities are predefined (in the same way that © is predefined); not that they must be escaped:

在可扩展标记语言(XML) 1.0(第5版)中，W3C确实提到了某些预定义的实体引用。但是它说这些实体是预定义的(与©的方式相同;预定义的);并不是说他们必须逃脱:

4.6 Predefined Entities

[Definition: Entity and character references may both be used to escape the left angle bracket, ampersand, and other delimiters. A set of general entities (amp, lt, gt, apos, quot) is specified for this purpose. Numeric character references may also be used; they are expanded immediately when recognized and must be treated as character data, so the numeric character references " <" and " & " may be used to escape
[定义:实体和字符引用都可以用来转义左角括号、&符和其他分隔符。为此目的指定了一组通用实体(amp、lt、gt、apos)。也可以使用数字字符引用;当它们被识别并且必须被视为字符数据时，它们会被立即扩展，因此数字字符引用“<”;”和“& # 38;“可用于转义 <和&当它们出现在字符数据中时。”

What characters must be escaped into entity references in HTML?
What characters must be escaped into entity references in XML?

在HTML中哪些字符必须转义为实体引用?在XML中，哪些字符必须转义到实体引用中?

Update:

更新:

From Extensible Markup Language (XML) 1.0 (Fifth Edition):

可扩展标记语言1.0(第五版):

2.4 Character Data and Markup

The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section.
If they are needed elsewhere, they must be escaped using either numeric character references or the strings "&" and "<" respectively.

除作为标记分隔符、注释、处理指令或CDATA部分之外，符号(&)和左角括号(<)不能以它们的文字形式出现。如果在其他地方需要它们，则必须分别使用数字字符引用或字符串“&”和“<”来转义它们。

The right angle bracket (>) may be represented using the string ">", and must, for compatibility, be escaped using either ">" or a character reference when it appears in the string "]]>" in content, when that string is not marking the end of a CDATA section.

右尖括号(>)可以用字符串“>”表示。在内容中，当该字符串不标记CDATA区域的末尾时，必须使用“>”或字符串中出现的字符引用来避免兼容性。

To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as "'", and the double-quote character (") as """.

为了允许属性值同时包含单引号和双引号，撇号或单引号字符(')可以表示为“&apos”;，双引号字符(")为";"

i read the former as saying that

我读了前一句

must be:

必须:

< (<) must be
<(& lt;)必须
& (&) must be
&(,)必须

may, but must when appearing as ]]>

可以，但当出现时一定要>吗?

> (>) must be, if appearing as ]]>
> (>)必须是[]>

And that ' and " don't have to be escaped at all; unless you want to have quotes inside quoted attributes.

和“根本不需要逃跑;除非你想引用内部引用的属性。

From HTML 4.01 Specification, HTML Document Representation:

从HTML 4.01规范，HTML文档表示:

5.3.2 Character entity references

Authors wishing to put the "<" character in text should use "<" (ASCII decimal 60) to avoid possible confusion with the beginning of a tag (start tag open delimiter).

希望在文本中加入“<”字符的作者应使用“<”(ASCII decimal 60)以避免可能与标记的开头混淆(开始标记打开分隔符)。

Similarly, authors should use ">" (ASCII decimal 62) in text instead of ">" to avoid problems with older user agents that incorrectly perceive this as the end of a tag (tag close delimiter) when it appears in quoted attribute values.

同样，作者应该使用“>”(ASCII十进制62)在文本中而不是“>”，以避免较老的用户代理在标记(标记结束分隔符)出现在引用的属性值中时错误地将其视为标记的结束。

Authors should use "&" (ASCII decimal 38) instead of "&" to avoid confusion with the beginning of a character reference (entity reference open delimiter). Authors should also use "&" in attribute values since character references are allowed within CDATA attribute values.

作者应该使用“,”(ASCII十进制)而不是“&”，以避免与字符引用的开头混淆(实体引用开放分隔符)。由于在CDATA属性值中允许字符引用，所以作者也应该在属性值中使用“&”。

Some authors use the character entity reference """ to encode instances of the double quote mark (") since that character may be used to delimit attribute values.

有些作者使用字符实体引用“';”来编码双引号(")的实例，因为该字符可以用来分隔属性值。

HTML is much more wishy-washy on the rules, but it sounds like i should:

HTML在规则上要比HTML宽松得多，但听起来我应该:

< should be with <
<应该与<
> should be with >
>应该与>
& should be with &
&应与…
" should be with "
“应该是”;

and if " can be an entity reference, i should also replace ' with &.

如果“可以作为实体引用，我也应该用&替换”。

Update Two

From HTML5 - A vocabulary and associated APIs for HTML and XHTML:

来自HTML5——HTML和XHTML的词汇表和相关api:

8.3 Serializing HTML fragments

Escaping a string (for the purposes of the algorithm above) consists of running the following steps:

转义字符串(为上述算法的目的)包括运行以下步骤:

Replace any occurrence of the "&" character by the string "&".

用字符串"&"替换任何出现的"&"字符。

Replace any occurrences of the U+00A0 NO-BREAK SPACE character by the string " ".

用字符串“&”替换所有出现的U+00A0无断点空间字符。

If the algorithm was invoked in the attribute mode, replace any occurrences of the """ character by the string """.

如果算法在属性模式中被调用，那么用字符串“;”替换出现的“”字符。

If the algorithm was not invoked in the attribute mode, replace any occurrences of the "<" character by the string "<", and any occurrences of the ">" character by the string ">".

如果在属性模式中没有调用该算法，则用字符串“<”替换“<”字符的任何出现。，以及字符串“>”中出现的“>”字符。

Which i read as HTML:

我把它读成HTML:

& by & always
通过和&;总是
by always
由,总是
" by " if it's inside an attribute
”,“如果它在属性中
< by < if it's not in an attribute (i.e. attributes can contain <)
<& lt;如果不在属性中(例如属性可以包含<)
> by > if it's not in an attribute (i.e. attributes can contain >)
祝辞>;如果它不在属性中(即属性可以包含>)

1 个解决方案

#1

First, you're comparing a HTML 4.01 specification with an HTML 5 one. HTML5 ties more closely in with XML than HTML 4.01 ever does (that's why we have XHTML), so this answer will stick to HTML 5 and XML.

首先，您正在比较HTML 4.01规范和HTML 5规范。HTML5与XML的联系比HTML 4.01更加紧密(这就是我们拥有XHTML的原因)，所以这个答案将只适用于HTML5和XML。

Your quoted references are all consistent on the following points:

您所引用的参考文献都符合以下几点:

< should always be represented with < when not indicating a processing instruction
<应该总是用<表示;当不指示处理指令时
> should always be represented with > when not indicating a processing instruction
>应始终用>表示;当不指示处理指令时
& should always be represented with &
应始终以& & & & & &
except when within (which only applies to XML)
除非在(只适用于XML)

I agree 100% with this. You never want the parser to mistake literals for instructions, so it's a solid idea to always encode any non-space (see below) character. Good parsers know that anything contained within are not instructions, so the encoding is not necessary there.

我百分之百同意。您永远不希望解析器将文本错误地用于指令，因此，始终将任何非空间(见下)字符编码是一个很好的想法。好的解析器知道不是指令，所以在那里编码是不必要的。

In practice, I never encode ' or " unless

实际上，我从不编码“或”除非”

it appears within the value of an attribute (XML or HTML)
它出现在属性(XML或HTML)的值中
it appears within the text of XML tags. ("Yoinks!", he said.)
它出现在XML标记的文本中。( <标记> “Yoinks !“,他说。)

Both specifications also agree with this.

两个规范也都同意这一点。

So, the only point of contention is the (space). The only mention of it in either specification is when serialization is attempted. When not, you should always use a literal (space). Unless you are writing your own parser, I don't see the need to be doing any kind of serialization, so this is beside the point.

所以，唯一的争论点是(空间)。在这两种规范中，只有在尝试序列化时才提到它。如果不是，则应该始终使用文字(空格)。除非您正在编写自己的解析器，否则我不认为需要进行任何类型的序列化，因此这是无关紧要的。

推荐阅读

search
阿里Treebased Deep Match(TDM) 学习笔记及技术发展回顾

本文介绍了阿里Treebased Deep Match(TDM)的学习笔记，同时回顾了工业界技术发展的几代演进。从基于统计的启发式规则方法到基于内积模型的向量检索方法，再到引入复杂深度学习模型的下一代匹配技术。文章详细解释了基于统计的启发式规则方法和基于内积模型的向量检索方法的原理和应用，并介绍了TDM的背景和优势。最后，文章提到了向量距离和基于向量聚类的索引结构对于加速匹配效率的作用。本文对于理解TDM的学习过程和了解匹配技术的发展具有重要意义。 ... [详细]

蜡笔小新 2023-12-14 19:24:58
数组
在类中定义数组时出错 - Error on defining arrays in class

Iamtryingtomakeaclassthatwillreadatextfileofnamesintoanarray,thenreturnthatarra ... [详细]

蜡笔小新 2023-12-14 17:38:12
io
单击后为什么远程通知操作无效？ - Why remote notification action is doing nothing after clicking?

IhaveconfiguredanactionforaremotenotificationwhenitarrivestomyiOsapp.Iwanttwodiff ... [详细]

蜡笔小新 2023-12-14 15:57:44
web
知识图谱——机器大脑中的知识库

本文介绍了知识图谱在机器大脑中的应用，以及搜索引擎在知识图谱方面的发展。以谷歌知识图谱为例，说明了知识图谱的智能化特点。通过搜索引擎用户可以获取更加智能化的答案，如搜索关键词"Marie Curie"，会得到居里夫人的详细信息以及与之相关的历史人物。知识图谱的出现引起了搜索引擎行业的变革，不仅美国的微软必应，中国的百度、搜狗等搜索引擎公司也纷纷推出了自己的知识图谱。 ... [详细]

蜡笔小新 2023-12-14 10:06:19
web
南邮ctf-web的writeup

本文介绍了南邮ctf-web的writeup，包括签到题和md5 collision。在CTF比赛和渗透测试中，可以通过查看源代码、代码注释、页面隐藏元素、超链接和HTTP响应头部来寻找flag或提示信息。利用PHP弱类型，可以发现md5('QNKCDZO')='0e830400451993494058024219903391'和md5('240610708')='0e462097431906509019562988736854'。 ... [详细]

蜡笔小新 2023-12-13 10:58:55
io
云原生边缘计算之KubeEdge简介及功能特点

本文介绍了云原生边缘计算中的KubeEdge系统，该系统是一个开源系统，用于将容器化应用程序编排功能扩展到Edge的主机。它基于Kubernetes构建，并为网络应用程序提供基础架构支持。同时，KubeEdge具有离线模式、基于Kubernetes的节点、群集、应用程序和设备管理、资源优化等特点。此外，KubeEdge还支持跨平台工作，在私有、公共和混合云中都可以运行。同时，KubeEdge还提供数据管理和数据分析管道引擎的支持。最后，本文还介绍了KubeEdge系统生成证书的方法。 ... [详细]

蜡笔小新 2023-12-14 16:49:01
ip
CSS3选择器的使用方法详解，提高Web开发效率和精准度

本文详细介绍了CSS3新增的选择器方法，包括属性选择器的使用。通过CSS3选择器，可以提高Web开发的效率和精准度，使得查找元素更加方便和快捷。同时，本文还对属性选择器的各种用法进行了详细解释，并给出了相应的代码示例。通过学习本文，读者可以更好地掌握CSS3选择器的使用方法，提升自己的Web开发能力。 ... [详细]

蜡笔小新 2023-12-14 14:37:52
ip
SpringBoot yml 配置多配置文件,开发环境,生产环境配置文件分开

原文地址:https:www.cnblogs.combaoyipSpringBoot_YML.html1.在springboot中，有两种配置文件，一种 ... [详细]

蜡笔小新 2023-12-14 12:39:13
sum
Open judge C16H: Magical Balls 快速幂+逆元问题解析

本文主要解析了Open judge C16H问题中涉及到的Magical Balls的快速幂和逆元算法，并给出了问题的解析和解决方法。详细介绍了问题的背景和规则，并给出了相应的算法解析和实现步骤。通过本文的解析，读者可以更好地理解和解决Open judge C16H问题中的Magical Balls部分。 ... [详细]

蜡笔小新 2023-12-14 12:03:27
io
差分约束系统求解House Man跳跃问题的思路与方法

本文讨论了使用差分约束系统求解House Man跳跃问题的思路与方法。给定一组不同高度，要求从最低点跳跃到最高点，每次跳跃的距离不超过D，并且不能改变给定的顺序。通过建立差分约束系统，将问题转化为图的建立和查询距离的问题。文章详细介绍了建立约束条件的方法，并使用SPFA算法判环并输出结果。同时还讨论了建边方向和跳跃顺序的关系。 ... [详细]

蜡笔小新 2023-12-14 11:49:51
schema
的错误消息：

ZSI.generate.Wsdl2PythonError: unsupported local simpleType restriction ... [详细]

蜡笔小新 2023-12-13 20:28:08
io
XML介绍与使用的概述及标签规则

本文介绍了XML的基本概念和用途，包括XML的可扩展性和标签的自定义特性。同时还详细解释了XML标签的规则，包括标签的尖括号和合法标识符的组成，标签必须成对出现的原则以及特殊标签的使用方法。通过本文的阅读，读者可以对XML的基本知识有一个全面的了解。 ... [详细]

蜡笔小新 2023-12-13 17:39:50
io
如何通过全新应用内评价获取更多优质用户反馈？

Google Play推出全新的应用内评价API，帮助开发者获取更多优质用户反馈。用户每天在Google Play上发表数百万条评论，这有助于开发者了解用户喜好和改进需求。开发者可以选择在适当的时间请求用户撰写评论，以获得全面而有用的反馈。全新应用内评价功能让用户无需返回应用详情页面即可发表评论，提升用户体验。 ... [详细]

蜡笔小新 2023-12-13 17:23:03
shell
Linux 正则表达式基础及使用注意事项

本文介绍了Linux系统中正则表达式的基础知识，包括正则表达式的简介、字符分类、普通字符和元字符的区别，以及在学习过程中需要注意的事项。同时提醒读者要注意正则表达式与通配符的区别，并给出了使用正则表达式时的一些建议。本文适合初学者了解Linux系统中的正则表达式，并提供了学习的参考资料。 ... [详细]

蜡笔小新 2023-12-13 14:24:45
ip
Java学习笔记之面向对象编程（OOP）

本文介绍了Java学习笔记中的面向对象编程（OOP）内容，包括OOP的三大特性（封装、继承、多态）和五大原则（单一职责原则、开放封闭原则、里式替换原则、依赖倒置原则）。通过学习OOP，可以提高代码复用性、拓展性和安全性。 ... [详细]

蜡笔小新 2023-12-13 08:44:30

蔡晓楠

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章