初学python,最近尝试爬数据,json字符串的value中有冒号,需要去掉。我的代码如下。
a和b都是value中会有冒号的字符串
import re a = "Title:'Intern: Customer Experience + Innovation (CX+I) Intern Brands'" b = "cmp:'Adecco: USA',cmpesc:'Adecco: USA'" result = re.sub('^(?:Title|cmp|cmpesc):.+(\:)','', a)
代码执行结果是只剩 Customer Experience + Innovation (CX+I) Intern Brands',之前的内容全被删除了,而我想要的效果是只删intern之后的那个冒号(title后的冒号要保留)。
请问大家该如何修改?
不用去掉冒号,直接变成字典就行了~
>>> a = "Title:'Intern: Customer Experience + Innovation (CX+I) Intern Brands'";\ b = "cmp:'Adecco: USA',cmpesc:'Adecco: USA'" >>> dict([s.split(':',1) for s in a.split(',')]) {'Title': "'Intern: Customer Experience + Innovation (CX+I) Intern Brands'"} >>> dict([s.split(':',1) for s in b.split(',')]) {'cmpesc': "'Adecco: USA'", 'cmp': "'Adecco: USA'"} >>>
写成函数
a = "Title:'Intern: Customer Experience + Innovation (CX+I) Intern Brands'" b = "cmp:'Adecco: USA',cmpesc:'Adecco: USA'" def fn(x): return dict((s.split(':',1) for s in x.replace("'","").split(','))) print(fn(a)) print(fn(b)) # {'Title': 'Intern: Customer Experience + Innovation (CX+I) Intern Brands'} # {'cmp': 'Adecco: USA', 'cmpesc': 'Adecco: USA'}
果然是我看错题目了....
这样的话:
''.join(re.split('(?<![Title|cmp|cmpesc]):',a))
就好了
import re
result = re.sub('^(Title|cmp|cmpesc:)(.+):(.*)',
'\\1\\2\\3',
"Title:'Intern: Customer Experience + Innovation (CX+I) Intern Brands'")
print(result) # Title:'Intern Customer Experience + Innovation (CX+I) Intern Brands'