作者:惠玲琦扬2 | 来源:互联网 | 2023-10-10 20:51
GoogleMapsresultsareoftendisplayedthus:通常会显示Google地图结果:\n113W5thSt\nEureka,MO,Unite
Google Maps results are often displayed thus:
通常会显示Google地图结果:
'\n113 W 5th St\nEureka, MO, United States\n(636) 938-9310\n'
Another variation:
'Clayton Village Shopping Center, 14856 Clayton Rd\nChesterfield, MO, United States\n(636) 227-2844'
And another:
'Wildwood, MO\nUnited States\n(636) 458-7707'
Notice the variation in the placement of the \n
characters.
请注意\ n字符位置的变化。
I'm looking to extract the first X lines as address, and the last line as phone number. A regex such as (.*\n.*)\n(.*)
would suffice for the first example, but falls short for the other two. The only thing I can rely on is that the phone number will be in the form (ddd) ddd-dddd
.
我希望将前X行提取为地址,将最后一行提取为电话号码。诸如(。* \ n。*)\ n(。*)之类的正则表达式对于第一个例子就足够了,但对于其他两个例子来说则不够。我唯一可以依赖的是电话号码的格式为(ddd)ddd-dddd。
I think a regex that will allow for each and every possible variation will be hard to come by. Is it possible to use split()
, but maintain the character we have split by? So in this example, split by "("
, to split out the address and phone number, but retain this character in the phone number? I could concatenate the "("
back into split("(")[1]
, but is there a neater way?
我认为一个允许每一种可能的变化的正则表达式很难得到。是否可以使用split(),但保持我们分裂的角色?所以在这个例子中,除以“(”,拆分地址和电话号码,但在电话号码中保留这个字符?我可以连接“(”回到拆分(“(”)[1],但是有一个更简洁的方式?
2 个解决方案
1
If I understand you correctly, you want to "extract the first X lines as address". Assuming that all the addresses you need are in the US this regex code should work for you. In any case, it works on the 3 examples you provided:
如果我理解正确,你想“将前X行提取为地址”。假设您需要的所有地址都在美国,这个正则表达式代码应该适合您。无论如何,它适用于您提供的3个示例:
import re
x = 'Wildwood, MO\nUnited States\n(636) 458-7707'
print re.findall(r'.*\n+.*\States', x)
The output is:
输出是:
['Wildwood, MO\nUnited States']
If you want to print it later without the \n
you can do it this way:
如果你想在没有\ n的情况下打印它,你可以这样做:
x = '\n113 W 5th St\nEureka, MO, United States\n(636) 938-9310\n'
y = re.findall(r'.*\n+.*\States', x)
y = y[0].rstrip()
When you print y
the output:
当您打印输出时:
113 W 5th St
Eureka, MO, United States
And, if you want to extract the phone number separately you can do this:
而且,如果您想单独提取电话号码,您可以这样做:
tel = '\n113 W 5th St\nEureka, MO, United States\n(636) 938-9310\n'
num = re.findall(r'.*\d+\-\d+', tel)
num = num[0].rstrip()
When you print num
the output:
当你打印num输出时:
(636) 938-9310