擅长:python、mysql、java
<p>使用您答案中的原始数据,您已经从双重编码中获得了mojibake。你需要双重解码才能正确翻译</p>
<pre><code>>>> s = b'# ::snt That\xc2\x92s what we\xc2\x92re with\xc2\x85You\xc2\x92re not sittin\xc2\x92 there in a back alley and sayin\xc2\x92 hey what do you say, five bucks?\n'
>>> s.decode('utf8').encode('latin1').decode('cp1252')
'# ::snt That’s what we’re with…You’re not sittin’ there in a back alley and sayin’ hey what do you say, five bucks?\n'
</code></pre>
<p>数据实际上是UTF-8格式,但在解码为Unicode时,错误的代码点是<code>Windows-1252</code>代码页的字节。<code>.encode('latin1')</code>将Unicode代码点1:1转换回字节,因为<code>latin1</code>编码是Unicode的前256个代码点,所以它可以作为Windows-1252正确解码</p>