警告:根:某些字符无法解码,已被替换字符替换。带着请求和Beastuifulsoup

2024-09-26 22:49:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我在几分钟前就有了这个网页抓取代码,但现在我得到了这个警告和编码。由于此请求不返回html,因此当我搜索标记的内容时,Beautifulsoup将返回None类型。这里出什么事了?我试着用谷歌搜索一下这个编码问题,但找不到一个明确的答案。

import requests
from bs4 import BeautifulSoup


url = 'http://finance.yahoo.com/q?s=aapl&fr=uh3_finance_web&uhb=uhb2'

data = requests.get(url)
soup = BeautifulSoup(data.content).text
print(data)

结果如下:

0.0 seconds
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]> 
{}

Process finished with exit code 0

Tags: andresponsewithnotsomerootbecould
2条回答
response = urlopen(notiurl)
html = response.read().decode(encoding="iso-8859-1")
soup = BeautifulSoup(html, 'html.parser')

检查编码--->;print(soup.original_encoding)

文档--->;https://www.crummy.com/software/BeautifulSoup/bs4/doc/#encodings

下面美女组的建造师为我工作:

soup = BeautifulSoup(open(html_path, 'r'),"html.parser",from_encoding="iso-8859-1")

相关问题 更多 >

    热门问题