作者:想丶风吹叶落 | 来源:互联网 | 2023-05-17 22:21
I want to escape the unescaped data inside a xml string e.g.
我想在xml字符串中转义未转义的数据。
string = "I want to escape these >, "
to
来
"I want to escape these >, "
- Now, I definitely can't use any xml parsing libraries like xml.dom.minidom or xml.etree because the data is unescaped & will give error
- 现在,我肯定不能使用任何xml解析库,比如xml.dom。minidom或xml。etree因为数据是不可转义的,会出现错误
In regex, I figure out way to match & get start and end positions of data substing
在regex中,我找到了匹配和获取数据删除的起始和结束位置的方法
exp = re.search(">.+?", label)
# Get position of the data between tags
start = exp.start() + 1
end = exp.end() - 2
return label[ : start] + saxutils.escape(label[start : end]) + label[end : ]
But in re.search, I can't match the exact xml format
但是在re.search中,我不能匹配确切的xml格式
- If I use re.findall I can't get positions of the substrings found
- 如果我使用re.findall,我无法获得找到的子字符串的位置
- I could always find positions of found substring by index but that won't be efficient, I want a simple but efficent solution
- 我总是可以通过索引找到找到子串的位置,但这不是有效的,我想要一个简单但有效的解决方案。
- BeautifulSoup solutions are welcomed but I wish there was some more beautiful way to do it with python's basic libraries
- 我们欢迎漂亮的解决方案,但是我希望有更漂亮的方法来使用python的基本库
1 个解决方案