如何避免unicodeError?

2024-10-01 15:34:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图写入文件,但出现以下错误:

Traceback (most recent call last):
  File "/private/var/folders/jv/9_sy0bn10mbdft1bk9t14qz40000gn/T/Cleanup At Startup/merge-395780681.888.py", line 151, in <module>
    gc_all_d.writerow(row)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 148, in writerow
    return self.writer.writerow(self._dict_to_list(rowdict))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0329' in position 5: ordinal not in range(128)

当我试图将辅导员数据库中的一行写入汇总他们姓名的文件后,会出现错误:

^{pr2}$

我在这陌生的水域。我在writerow()方法中看不到可以将编码范围扩大到字符'\u0329'的参数。在

我认为这个错误可能与我使用nameparser模块将所有辅导员的名字组织成相同的格式有关。从nameparser导入的HumanName函数可能会用前导的“u”写出顾问的名字,以表示unicode,这意味着无法识别总输出u“Sam The Man”而不是“Sam The Man”。在

谢谢你的帮助!在


根据答案修正后的错误:

  File "/private/var/folders/jv/9_sy0bn10mbdft1bk9t14qz40000gn/T/Cleanup At Startup/merge-395782963.700.py", line 153, in <module>
    row['name'] = row['name'].encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 11: ordinal not in range(128)

使所有名称条目一致的代码:

# nbcc
with(open('/Users/samuelfinegold/Documents/noodle/gc/nbcc/nbcc_output.txt', 'rU')) as nbcc:
    nbcc_d = csv.DictReader(nbcc, delimiter = '\t')
    nbcc_l = []
    for row in nbcc_d:
#         name = HumanName(row['name'])
#         row['name'] = name.title + ' ' + name.first + ' ' + name.middle + ' ' + name.last + ' ' + name.suffix       
        row['phone'] = row['phone'].translate(None, whitespace + punctuation)
        nbcc_l.append(row)

修订代码:

# compile master spreadsheet
with(open('gc_all.txt_3','w')) as gc_all:
    gc_all_d = csv.DictWriter(gc_all,  fieldnames = fieldnames, extrasaction='ignore', delimiter = '\t') 
    gc_all_d.writeheader()
    for row in nbcc_l:
        row['name'] = row['name'].encode('utf-8')
        gc_all_d.writerow(row)

错误:

Traceback (most recent call last):
  File "/private/var/folders/jv/9_sy0bn10mbdft1bk9t14qz40000gn/T/Cleanup At Startup/merge-395784700.086.py", line 153, in <module>
    row['name'] = row['name'].encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 11: ordinal not in range(128)
logout

Tags: nameinpyvar错误lineprivateall
2条回答

您所拥有的是一个输出流(您的gc_all.txt_3文件,在with行打开,在变量gc_all中有一个流实例),Python认为它只能包含ASCII。您要求它编写一个Unicode字符为'\u0329'的Unicode字符串。例如:

>>> s = u"foo\u0329bar"
>>> with open('/tmp/unicode.txt', 'w') as stream: stream.write(s)
...

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0329' in position 3:
ordinal not in range(128)

您有很多选项,包括对每个字符串执行显式.encode。{x{x,假设你可以用一个小的^.m}打开一个小的^.m}文件:

^{pr2}$

编辑添加:根据@Peter degloper的回答,显式encode可能更安全。UTF-8的编码中没有NULs,所以假设您想要UTF-8,而且通常有一个这样做,这个可能是可以的。在

docs

This version of the csv module doesn’t support Unicode input. Also, there are currently some issues regarding ASCII NUL characters. Accordingly, all input should be UTF-8 or printable ASCII to be safe; see the examples in section Examples.

在编写数据之前,您需要对数据进行编码,例如:

for row in aicep_1:
    print row['name']
    for key, value in row.iteritems():
        row[key] = value.encode('utf-8')
    gc_all_d.writerow(row)

或者,既然你是2.7版,你可以使用字典理解:

^{pr2}$

或者使用文档中示例页面上的一些更复杂的模式。在

相关问题 更多 >

    热门问题