问

为什么'hallo \nworld'可以匹配R中的\n和\\n？

D3LJ664D5LJ81111 发布于 2023-02-12 12:24

grep

为什么grep对待\n ,并 \\n以同样的方式？

例如,两者都匹配hallo\nworld.

grep("hallo\nworld", pattern="\n")
[1] 1
grep("hallo\nworld", pattern="\\n")
[1] 1

我看到它hallo\nworld被解析成了

hallo  
world

也就是说,hallo在一行和world一行上.

那么grep("hallo\nworld", pattern="\n"),是pattern="\n"新线还是\n字面意思？

另请注意,其他人会这样; \a \f \n \t \r并且\\a \\f \\n \\t \\r都受到相同的待遇.但\d \w \s不能用!为什么不？

我选择了不同的字符串来测试,我在正则表达式的概念中找到了秘密.

escape有两个概念,一个是字符串中的escape,它很容易理解; 另一个是在常规模式表达式字符串中转义.中的R如图案grep(x, pattern=" some string here "), \\n= \n=换行符.但是在常见的字符串中,\\n!= \n,前者是字面意思\n,后者是换行符.我们可以证明这一点:

cat("\n")

cat("\\n")
\n>

怎么证明这个？我会尝试其他角色,而不仅仅是\n,看看它们是否以相同的方式匹配.

special1 <- c( "\a", "\f", "\n", "\t", "\r")
special2 <- c("\\a","\\f","\\n","\\t","\\r")
target <- paste("hallo", special1, "world", sep="")
for (i in 1:5){
    cat("i=", i, "\n")
    if( grep(target[i], pattern=special1[i]) == 1)
        print(paste(target[i], "match", special1[i], "succeed"))
    if( grep(target[i], pattern=special2[i]) == 1)
        print(paste(target[i], "match", special2[i], "succeed"))
}

输出:

i= 1   
[1] "hallo\aworld match \a succeed"  
[1] "hallo\aworld match `\\a` succeed"  
i= 2   
[1] "hallo\fworld match \f succeed"  
[1] "hallo\fworld match `\\f` succeed"  
i= 3   
[1] "hallo\nworld match \n succeed"  
[1] "hallo\nworld match `\\n` succeed"  
i= 4   
[1] "hallo\tworld match \t succeed"  
[1] "hallo\tworld match `\\t` succeed"  
i= 5   
[1] "hallo\rworld match \r succeed"  
[1] "hallo\rworld match `\\r` succeed"

请注意,\a \f \n \t \r并且 \\a \\f \\n \\t \\r 在R常规模式表达式字符串中处理完全相同!

不仅如此,你不能用\d \w \sR正则表达式模式写!
你可以写下任何一个:

pattern="\a" "pattern=\f" "pattern=\n" "pattern=\t" "pattern=\r"

但你不能写任何这些!

pattern="\d" "pattern="\w" "pattern=\s"  in grep.

我认为这也是一个错误,因为\d \w \s不平等对待\a \f \n \t \r.

撰写答案

今天，你开发时遇到什么问题呢？

立即提问

热门标签