作者:小蚊子 | 来源:互联网 | 2023-01-31 20:33
Withregexhowcanamatcheverythinginastringthatisntsomething?Thismaynotmakesensebutr
With regex how can a match everything in a string that isnt something? This may not make sense but read on.
使用regex,如何将不属于某个内容的字符串中的所有内容匹配起来?这可能说不通,但请继续读下去。
So take the word baby
for instance to match everything that isn't a b
you would do something like [^b]
and this would match a
and y
. Simple enough! But how in this string Ben sits on a bench
can I match everything that isn't ben
so i would be attempting to match sits on a ch
?
所以比如婴儿这个词匹配的一切不是你会做类似[^ b]这将匹配和y。很简单!但是本坐在板凳上,我怎么能把不是本的东西都匹配起来呢?
Better yet match everything that isn't a pattern? e.g. in 1a2be3
match everything that isn't number,letter,number
, so it would match every combination in the string except 1a2
?
更好的是匹配所有不是模式的东西?例如,在1a2be3中匹配所有不是数字,字母,数字的东西,所以它会匹配字符串中的所有组合,除了1a2?
6 个解决方案
1
Short answer: You can't do what you're asking. Technically, the first part has an ugly answer, but the second part (as I understand it) has no answer.
简短的回答:你不能做你要求的。从技术上讲,第一部分有一个丑陋的答案,但是第二部分(我理解)没有答案。
For your first part, I have a pretty impractical (yet pure regex) answer; anything better would require code (like @rednaw's much cleaner answer above). I added to the test to make it more comprehensive. (For simplicity, I'm using grep -Pio
for PCRE, case insensitive, printing one match per line.)
对于你的第一部分,我有一个非常不切实际(但纯粹的正则表达式)的答案;任何更好的方法都需要代码(比如@rednaw上面的更简洁的答案)。我增加了测试,使它更全面。(为了简单起见,我在PCRE中使用grep -Pio,不区分大小写,每行打印一个匹配。)
$ echo "Ben sits on a bench better end" \
|grep -Pio '(?=b(?!en)|(?
I'm basically making a special case for any letter in "ben" so I can include only iterations that are not themselves part of the string "ben." As I said, not really practical, even if I am technically answering your question. I've also saved a blow-by-blow explanation of this regex if you want further detail.
我基本上为“ben”中的任何字母都做了一个特殊的情况,所以我只能包含不属于字符串“ben”的迭代。就像我说的,不太实际,即使我严格地回答了你的问题。我还保存了这个regex的详细说明。
If you're forced into using a pure regex rather than code, your best bet for items like this is to write code to generate the regex. That way you can keep a clean copy of it.
如果您被迫使用纯regex而不是代码,您最好的选择是编写代码来生成regex。这样你就可以保留一个干净的副本。
I'm not sure what you're asking for the remainder of your challenge; a regex is either greedy or lazy [1] [2], and I don't know of any implementations that can find "every combination" rather than merely the first combination by either method. If there were such a thing, it would be very very slow in real life (rather than quick examples); the slow speed of regex engines would be intolerable if they were forced to examine every possibility, which would basically be a ReDoS.
我不确定你对剩下的挑战有什么要求;regex要么是贪婪的,要么是懒惰的[1][2],我不知道有任何实现可以找到“每个组合”,而不是通过任何一种方法找到第一个组合。如果有这样的东西,在现实生活中它会非常缓慢(而不是快速的例子);regex引擎的缓慢速度将是无法忍受的,如果他们被迫检查每一种可能性,这基本上是一个重dos。
Examples:
例子:
# greedy evaluation (default)
$ echo 1a2be3 |grep -Pio '(?!\d[a-z]\d)\w+'
a2be3
# lazy evaluation
$ echo 1a2be3 |grep -Pio '(?!\d[a-z]\d)\w+?'
a
2
b
e
3
I assume you are looking for 1
1a
a
a2
a2b
a2be
a2be3
2
2b
2be
2be3
b
be
be3
e
e3
3
but I don't think you can get that with a pure regex. You'd need some code to generate every substring and then you could use a regex to filter out the forbidden pattern (again, this is all about greedy vs lazy vs ReDoS).
我猜你是在寻找1a a a2 a2b a2be3 2b 2be3 b be be3 e3但我认为你不可能用一个纯的regex得到它。您需要一些代码来生成每个子字符串,然后您可以使用regex来过滤禁止的模式(同样,这都是关于贪婪vs lazy vs ReDoS)。