解析请求响应时,应该使用.text还是.content?

2024-05-12 09:18:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我偶尔使用res.contentres.text来解析来自Requests的响应。在我使用过的用例中,我使用哪个选项似乎并不重要。

.content.text解析HTML的主要区别是什么?例如:

import requests 
from lxml import html
res = requests.get(...)
node = html.fromstring(res.content)

在上述情况下,我应该使用res.content还是res.text?什么时候使用它们是一个好的经验法则?


Tags: textfromimportnodegethtml选项res
1条回答
网友
1楼 · 发布于 2024-05-12 09:18:40

documentation

When you make a request, Requests makes educated guesses about the encoding of the response based on the HTTP headers. The text encoding guessed by Requests is used when you access r.text. You can find out what encoding Requests is using, and change it, using the r.encoding property:

>>> r.encoding
'utf-8'
>>> r.encoding = 'ISO-8859-1'

If you change the encoding, Requests will use the new value of r.encoding whenever you call r.text. You might want to do this in any situation where you can apply special logic to work out what the encoding of the content will be. For example, HTTP and XML have the ability to specify their encoding in their body. In situations like this, you should use r.content to find the encoding, and then set r.encoding. This will let you use r.text with the correct encoding.

因此,当服务器返回二进制数据或伪造的编码头时,使用r.content,试图在元标记中找到正确的编码。

相关问题 更多 >