Python编解码器包无法由解码问题的回答

Python编解码器包无法由解码

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<code>codecs.open()</code>为您编码。不要把编码过的数据交给它，因为Python会再次尝试将数据解码为UTF-8。隐式解码使用ASCII编解码器，但由于编码字节字符串中有非ASCII数据，因此无法执行以下操作： <pre><code>>>> u'Dâ€™Iberville'.encode('utf8') 'D\xc3\xa2\xe2\x82\xac\xe2\x84\xa2Iberville' >>> u'Dâ€™Iberville'.encode('utf8').encode('utf8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128) </code></pre> 解决方案是*不手动编码： ^{pr2}$ 请注意，<code>codecs.open()</code>不是文件流的最有效实现。在Python2.7中，我将使用<a href="https://docs.python.org/2/library/io.html#io.open" rel="nofollow">^{<cd3>} instead</a>；它提供了相同的功能，但实现得更健壮。<code>io</code>模块是Python3的默认I/O实现，但在Python2中也提供了向前兼容性。在 但是，您似乎在重新发明CSV处理；Python有一个优秀的<a href="https://docs.python.org/2/library/csv.html" rel="nofollow">^{<cd5>} module</a>，可以为您生成CSV文件。但是，在Python 2中，它无法处理Unicode，因此需要手动编码： <pre><code>import csv # ... year = foo.text name = foo1.text city = foo3.text.strip() state = foo4.text row = [year, name, city, state] with open(Outfile.csv, "wb") as outf: writer = csv.writer(outf) writer.writerow(['Year', 'Name', 'City', 'State']) writer.writerow([c.encode('utf8') for c in row]) </code></pre> 最后但并非最不重要的是，如果您的HTML页面生成了文本<code>Dâ€™Iberville</code>，那么您生成了一个<a href="http://en.wikipedia.org/wiki/Mojibake" rel="nofollow">Mojibake</a>；其中您将UTF-8误解为CP-1252： <pre><code>>>> u'Dâ€™Iberville'.encode('cp1252').decode('utf8') u'D\u2019Iberville' >>> print u'Dâ€™Iberville'.encode('cp1252').decode('utf8') D’Iberville </code></pre> 这通常是由于绕过BeautifulGroup的编码检测（传入字节字符串，而不是Unicode）引起的。在 你可以尝试在事后用以下方法“修复”这些问题： <pre><code>try: City = City.encode('cp1252').decode('utf8') except UnicodeError: # Not a value that could be de-mojibaked, so probably # not a Mojibake in the first place. pass </code></pre>

Python编解码器包无法由解码

1 个回答

相关Python问题