Python和Beautifulsoup

2024-09-30 22:21:17 发布

您现在位置:Python中文网/ 问答频道 /正文

嗨,我正试图从这个website解析html

然而,soup永远需要加载整个html(打印到终端大约需要17秒),我确实意识到这只是因为网站本身(其他目录似乎会立即加载),但以下是我的代码以防万一:

import urllib2
from bs4 import BeautifulSoup

url1 = 'http://www.ukpets.co.uk/ukp/?sf=1716769780&rtn=temp87_224_76_126_at_1456&display_profile=&section=Commercial&sub=Search_&rws=&method=search&tb=comdir1_8&class=comdir1_8&search_form=on&rf=coname&st=Food'
soup = BeautifulSoup(urllib2.urlopen(url1), 'lxml')
print soup 

所以我的问题是,有没有其他解析器可以更快地完成这项工作,或者我可以在bs中使用一些东西

p.S.也试过硒


Tags: 代码import目录终端search网站htmlwebsite
1条回答
网友
1楼 · 发布于 2024-09-30 22:21:17

我不知道你有什么问题,但这一系列语句在我的旧电脑上一眨眼就执行了。你可以试试这个

>>> from bs4 import BeautifulSoup
>>> from urllib.request import urlopen
>>> URL = 'http://www.ukpets.co.uk/ukp/?sf=1716769780&rtn=temp87_224_76_126_at_1456&display_profile=&section=Commercial&sub=Search_&rws=&method=search&tb=comdir1_8&class=comdir1_8&search_form=on&rf=coname&st=Food'
>>> HTML = urlopen ( URL )
>>> soup = BeautifulSoup ( HTML )
C:\Python34\lib\site-packages\bs4\__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "lxml")

  markup_type=markup_type))

相关问题 更多 >