HTTP响应被BeautifulSoup的read（）自动修改

from urllib.request import urlopen from bs4 import BeautifulSoup def getLinks(pageUrl): html1 = urlopen(pageUrl) html2 = urlopen(pageUrl) html3 = html1 body1 = html1.read() bsObj1 = BeautifulSoup(html1) bsObj2 = BeautifulSoup(html2) bsObj3 = BeautifulSoup(html3) print("bsObj1's length is "+str(len(bsObj1.text))) print("bsObj2's length is "+str(len(bsObj2.text))) print("bsObj3's length is "+str(len(bsObj3.text))) if __name__ == '__main__': getLinks("https://en.wikipedia.org/wiki/Main_Page")

1条回答

网友

1楼 · 发布于 2024-09-30 16:32:41

我相信你的密码有误。你已经读过html1了，所以当你把它解析成BeautifulSoup时，它什么也不读，因为body1 = html1.read()。已经读过html1了吗？html3和html1

所以下面的代码可以正常工作

body1 = html1.read()

bsObj1 = BeautifulSoup(body1)
bsObj2 = BeautifulSoup(html2)
bsObj3 = BeautifulSoup(body1)

样本输出

bsObj1的长度是16028 bsObj2的长度是16028 bsObj3的长度是16028

希望这有帮助

相关问题更多 >

编程相关推荐

热门问题

热门文章