我在将unicode CSV字符串读入python-unicodescv时遇到问题:
>>> import unicodecsv, StringIO >>> f = StringIO.StringIO(u'é,é') >>> r = unicodecsv.reader(f, encoding='utf-8') >>> row = r.next() Traceback (most recent call last): File "", line 1, in File "/Users/guy/test/.env/lib/python2.7/site-packages/unicodecsv/__init__.py", line 101, in next row = self.reader.next() UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)
我猜这是一个问题,我如何将我的unicode字符串转换为StringIO文件?python-unicodecsv github页面上的示例工作正常:
>>> import unicodecsv >>> from cStringIO import StringIO >>> f = StringIO() >>> w = unicodecsv.writer(f, encoding='utf-8') >>> w.writerow((u'é', u'ñ')) >>> f.seek(0) >>> r = unicodecsv.reader(f, encoding='utf-8') >>> row = r.next() >>> print row[0], row[1] é ñ
使用cStringIO尝试我的代码失败,因为cStringIO无法接受unicode(所以为什么示例有效,我不知道!)
>>> from cStringIO import StringIO >>> f = StringIO(u'é') Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)
我需要从web textarea表单字段接受UTF-8 CSV格式的输入,因此不能只从文件读入.
有任何想法吗?
该unicodecsv
文件为您读取和解码字节字符串.你正在传递它的unicode
字符串.在输出时,使用配置的编解码器将您的unicode值编码为字节串.
另外,cStringIO.StringIO
只能处理编码的字节串,而pure-python StringIO.StringIO
类很乐意将unicode
值视为字节字符串.
解决方案是在将unicode值放入对象之前对其进行编码StringIO
:
>>> import unicodecsv, StringIO, cStringIO >>> f = StringIO.StringIO(u'é,é'.encode('utf8')) >>> r = unicodecsv.reader(f, encoding='utf-8') >>> next(r) [u'\xe9', u'\xe9'] >>> f = cStringIO.StringIO(u'é,é'.encode('utf8')) >>> r = unicodecsv.reader(f, encoding='utf-8') >>> next(r) [u'\xe9', u'\xe9']