擅长:python、mysql、java
<p>请求<a href="http://docs.python-requests.org/en/master/user/advanced/#encodings" rel="nofollow noreferrer">replies on</a>HTTP <code>Content-Type</code>响应头和<code>chardet</code>。对于<code>text/html</code>的常见情况,它假定默认值为<code>ISO-8859-1</code>。问题是请求对HTML元标记一无所知,它可以指定不同的文本编码,例如<code><meta charset="utf-8"></code>或{<cd6>}。在</p>
<p>一个好的解决方案是使用beauthoulsoup的“<a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/#unicode-dammit" rel="nofollow noreferrer">Unicode, Dammit</a>”功能,如下所示:</p>
<pre><code>from bs4 import UnicodeDammit
import requests
url = 'http://www.reynamining.com/nuevositio/contacto.html'
r = requests.get(url)
dammit = UnicodeDammit(r.content)
r.encoding = dammit.original_encoding
print(r.text)
</code></pre>