我被困在这里尝试不转义HTML特殊字符。
有问题的文字是
Rudimental & Emeli Sandé
应该转换为 基本&EmeliSandé
文本是通过WGET下载的(在Python外部)
要对此进行测试,请在此行中保存一个ANSI文件并导入。
import HTMLParser trackentry = open('import.txt', 'r').readlines() print(trackentry) track = trackentry[0] html_parser = HTMLParser.HTMLParser() track = html_parser.unescape(track) print(track)
当一行中有é时,我会收到此错误。
*pi@raspberrypi ~/scripting $ python unparse.py ['Rudimental & Emeli Sand\xe9\n'] Traceback (most recent call last): File "unparse.py", line 9, intrack = html_parser.unescape(track) File "/usr/lib/python2.7/HTMLParser.py", line 472, in unescape return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s) File "/usr/lib/python2.7/re.py", line 151, in sub return _compile(pattern, flags).sub(repl, string, count) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 11: ordinal not in range(128)*
相同的代码在Windows下可以正常工作-我只在运行Python 2.7.3的树莓派上遇到问题。