详细的正则表达式注释中的连字符会导致错误-Hypheninverboseregexcommentcauseserror

作者：所谓-旧 | 来源：互联网 | 2023-05-17 13:43

Whatswrongwiththefollowingcode-Ipinpointedittothehypheninthecomment,butwhyshould

What's wrong with the following code - I pinpointed it to the hyphen in the comment, but why should that cause an error?

以下代码出了什么问题 - 我在评论中将其指向连字符,但为什么会导致错误?

import re

valid = re.compile(r'''[^
\uFFFE\uFFFF   # non-characters
]''', re.VERBOSE)


Traceback (most recent call last):
  File "valid.py", line 5, in 
    ]''', re.VERBOSE)
  File "/usr/local/lib/python3.3/re.py", line 214, in compile
    return _compile(pattern, flags)
  File "/usr/local/lib/python3.3/re.py", line 281, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/local/lib/python3.3/sre_compile.py", line 494, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/local/lib/python3.3/sre_parse.py", line 748, in parse
    p = _parse_sub(source, pattern, 0)
  File "/usr/local/lib/python3.3/sre_parse.py", line 360, in _parse_sub
    itemsappend(_parse(source, state))
  File "/usr/local/lib/python3.3/sre_parse.py", line 506, in _parse
    raise error("bad character range")
sre_constants.error: bad character range

This next segment without the hyphen is error free:

没有连字符的下一个段没有错误:

import re

valid = re.compile(r'''[^
\uFFFE\uFFFF   # non characters !! no errors
]''', re.VERBOSE)

Edit:

Adding to the answer of @nhahtdh, string concatenation seems another reasonable way to comment character classes in a verbose style:

添加到@nhahtdh的答案,字符串连接似乎是另一种以详细样式注释字符类的合理方法:

valid = re.compile( r'[^'
r'\u0000-\u0008'    # C0 block first segment
r'\u000Bu\u000C'    # allow TAB U+0009, LF U+000A, and CR U+000D
r'\u000E-\u001F'    # rest of C0
r'\u007F'           # disallow DEL U+007F
r'\u0080-\u009F'    # All C1 block
r']'                # don't forget this!
r'''
| [0-9]    # normal verbose style
| [a-z]    # another term +++
''', re.VERBOSE)

2 个解决方案

#1

According to the documentation (emphasis mine):

根据文件(强调我的):

re.X
re.VERBOSE

This flag allows you to write regular expressions that look nicer. Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash, and, when a line contains a '#' neither in a character class or preceded by an unescaped backslash, all characters from the leftmost such '#' through the end of the line are ignored.

此标志允许您编写看起来更好的正则表达式。模式中的空格被忽略,除非在字符类中或前面有未转义的反斜杠,并且当一行中的字符类中既没有'#'也没有未转义的反斜杠时,最左边的所有字符都是'# '到最后一行被忽略了。

Basically, you cannot have comment inside a character class, and whitespace inside character class is considered significant.

基本上,您不能在字符类中进行注释,并且字符类中的空格被认为是重要的。

Since # is inside character class, it does not function as comment, and everything inside the character class is parsed as part of the character class without exception (even the new line character is parsed as part of the character class). The error is thrown due to n-c being invalid character range.

由于#在字符类中,因此它不作为注释起作用,并且字符类中的所有内容都被解析为字符类的一部分而没有异常(即使新行字符被解析为字符类的一部分)。由于n-c是无效字符范围而引发错误。

Valid way to write the expression would be:

编写表达式的有效方法是:

valid = re.compile(r'[^\uFFFE\uFFFF]   # non-characters', re.VERBOSE)

Here is one suggestion on how to comment when you want to explain a lengthy character class:

以下是关于如何在需要解释冗长字符类时进行注释的建议:

r'''
# LOTS is for foo
# _ is a special fiz
# OF-LITERAL is for bar
[^LOTS_OF-LITERAL]
'''

#2

-1

Comments don't always play nice in regular expressions, and it looks like your regex engine is parsing the hyphen as part of the regular expression. You can't rely on comments not getting parsed here. This is a good thing to find out before implementing this code.

注释在正则表达式中并不总是很好用,看起来你的正则表达式引擎正在解析连字符作为正则表达式的一部分。您不能依赖于未在此处解析的评论。在实现此代码之前,这是一件好事。

推荐阅读

string
clone的fork与pthread_create创建线程有何不同

本文讨论了clone的fork与pthread_create创建线程的不同之处。进程是一个指令执行流及其执行环境，其执行环境是一个系统资源的集合。在调用系统调用fork创建一个进程时，子进程只是完全复制父进程的资源，这样得到的子进程独立于父进程，具有良好的并发性。但是二者之间的通讯需要通过专门的通讯机制，另外通过fork创建子进程系统开销很大。因此，在某些情况下，使用clone或pthread_create创建线程可能更加高效。 ... [详细]

蜡笔小新 2023-12-12 20:00:06
import
vue使用

关键词： ... [详细]

蜡笔小新 2023-12-14 19:14:56
get
android listview OnItemClickListener失效原因

最近在做listview时发现OnItemClickListener失效的问题，经过查找发现是因为button的原因。不仅listitem中存在button会影响OnItemClickListener事件的失效，还会导致单击后listview每个item的背景改变，使得item中的所有有关焦点的事件都失效。本文给出了一个范例来说明这种情况，并提供了解决方法。 ... [详细]

蜡笔小新 2023-12-14 14:25:50
get
Python正则表达式学习记录及常用方法

本文记录了学习Python正则表达式的过程，介绍了re模块的常用方法re.search，并解释了rawstring的作用。正则表达式是一种方便检查字符串匹配模式的工具，通过本文的学习可以掌握Python中使用正则表达式的基本方法。 ... [详细]

蜡笔小新 2023-12-13 16:37:19
get
小程序wxs中的时间格式化以及格式化时间和date时间互转

本文介绍了在小程序wxs中进行时间格式化操作的问题，并提供了解决方法。同时还介绍了格式化时间和date时间的互相转换的方法。 ... [详细]

蜡笔小新 2023-12-11 12:21:25
get
Python基础篇：315道题目及答案整理，帮助你检验学习成果

本文整理了315道Python基础题目及答案，帮助读者检验学习成果。文章介绍了学习Python的途径、Python与其他编程语言的对比、解释型和编译型编程语言的简述、Python解释器的种类和特点、位和字节的关系、以及至少5个PEP8规范。对于想要检验自己学习成果的读者，这些题目将是一个不错的选择。请注意，答案在视频中，本文不提供答案。 ... [详细]

蜡笔小新 2023-12-10 14:33:46
get
数组或散列中的正则表达式排序 - Regex in array or hash - sorting

Ihaveaworkfolderdirectory.我有一个工作文件夹目录。holderDir.glob(*)>holder[ProjectOne, ... [详细]

蜡笔小新 2023-12-10 12:41:53
get
Java工具类库Hutool介绍及功能概述

本文介绍了Java工具类库Hutool，该工具包封装了对文件、流、加密解密、转码、正则、线程、XML等JDK方法的封装，并提供了各种Util工具类。同时，还介绍了Hutool的组件，包括动态代理、布隆过滤、缓存、定时任务等功能。该工具包可以简化Java代码，提高开发效率。 ... [详细]

蜡笔小新 2023-12-14 14:29:36
import
关于cuowu类的错误提示和使用AdjustmentListener的问题

本文讨论了一个关于cuowu类的问题，作者在使用cuowu类时遇到了错误提示和使用AdjustmentListener的问题。文章提供了16个解决方案，并给出了两个可能导致错误的原因。 ... [详细]

蜡笔小新 2023-12-13 22:09:56
get
Linux环境变量函数getenv、putenv、setenv和unsetenv详解

本文详细解释了Linux中的环境变量函数getenv、putenv、setenv和unsetenv的用法和功能。通过使用这些函数，可以获取、设置和删除环境变量的值。同时给出了相应的函数原型、参数说明和返回值。通过示例代码演示了如何使用getenv函数获取环境变量的值，并打印出来。 ... [详细]

蜡笔小新 2023-12-13 12:01:03
get
关于Linq to sql 实现模糊查询 string数组

前景：当UI一个查询条件为多项选择，或录入多个条件的时候，比如查询所有名称里面包含以下动态条件，需要模糊查询里面每一项时比如是这样一个数组条件：newstring[]{兴业银行, ... [详细]

蜡笔小新 2023-12-13 09:34:59
import
Java学习笔记之面向对象编程（OOP）

本文介绍了Java学习笔记中的面向对象编程（OOP）内容，包括OOP的三大特性（封装、继承、多态）和五大原则（单一职责原则、开放封闭原则、里式替换原则、依赖倒置原则）。通过学习OOP，可以提高代码复用性、拓展性和安全性。 ... [详细]

蜡笔小新 2023-12-13 08:44:30
import
手机移动端HTML5和JavaScript如何实现视频上传和压缩视频质量？

本文讨论了在手机移动端如何使用HTML5和JavaScript实现视频上传并压缩视频质量，或者降低手机摄像头拍摄质量的问题。作者指出HTML5和JavaScript无法直接压缩视频，只能通过将视频传送到服务器端由后端进行压缩。对于控制相机拍摄质量，只有使用JAVA编写Android客户端才能实现压缩。此外，作者还解释了在交作业时使用zip格式压缩包导致CSS文件和图片音乐丢失的原因，并提供了解决方法。最后，作者还介绍了一个用于处理图片的类，可以实现图片剪裁处理和生成缩略图的功能。 ... [详细]

蜡笔小新 2023-12-12 15:58:44
get
深入浅出Linux设备驱动编程的重要性与方法

本文介绍了深入浅出Linux设备驱动编程的重要性，以及两种加载和删除Linux内核模块的方法。通过一个内核模块的例子，展示了模块的编译和加载过程，并讨论了模块对内核大小的控制。深入理解Linux设备驱动编程对于开发者来说非常重要。 ... [详细]

蜡笔小新 2023-12-12 15:28:09
get
Express App如何提供不需要的静态文件？

本文介绍了如何使用Express App提供静态文件，同时提到了一些不需要使用的文件，如package.json和/.ssh/known_hosts，并解释了为什么app.get('*')无法捕获所有请求以及为什么app.use(express.static(__dirname))可能会提供不需要的文件。 ... [详细]

蜡笔小新 2023-12-12 14:38:07

所谓-旧

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章