美女苏刮：我糊涂了 - 问答 - Python中文网

美女苏刮：我糊涂了

2024-06-25 06:30:39 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试刮取this site，我想检查所有的锚定标记。你知道吗

我已经导入了beautifulsoup 4.3.2，下面是我的代码：

url = """http://www.civicinfo.bc.ca/bids?pn=1"""
Html = urlopen(url).read()
Soup = BeautifulSoup(Html, 'html.parser')
Content = Soup.find_all('a')

我的问题是内容总是空的（即Content=[]）。有人有什么想法吗？你知道吗

Tags：代码标记 http url html www site content

1条回答

网友

1楼 · 发布于 2024-06-25 06:30:39

从the documentation开始html.parser在Python的某些版本之前不是很宽容。所以你可能看到了一些格式错误的HTML。你知道吗

如果您使用lxml而不是html.parser，那么您想要做的事情就会起作用

从the documentation：

That said, there are things you can do to speed up Beautiful Soup. If you’re not using lxml as the underlying parser, my advice is to start. Beautiful Soup parses documents significantly faster using lxml than using html.parser or html5lib.

所以相关的代码是：

Soup = BeautifulSoup(Html, 'lxml')

相关问题更多 >

编程相关推荐

热门问题

热门文章