Beautifulsoup不处理特定的si

url = 'http://www.zap.com.br/imoveis/rio-de-janeiro+rio-de-janeiro/apartamento-padrao/venda/' response = urllib2.urlopen(url).read() doc = BeautifulSoup(response) divs = doc.findAll('div') print len(divs) # prints 0.

2条回答

网友

1楼 · 编辑于 2024-06-28 19:01:27

您的环境有问题，以下是我得到的输出：

>>> url = 'http://www.zap.com.br/imoveis/rio-de-janeiro+rio-de-janeiro/apartamento-padrao/venda/'
>>> from bs4 import BeautifulSoup
>>> import urllib2
>>> response = urllib2.urlopen(url).read()
>>> doc = BeautifulSoup(response)
>>> divs = doc.findAll('div')
>>> print len(divs) # prints 0.
558

网友

2楼 · 编辑于 2024-06-28 19:01:27

这是因为您让BeautifulSoup为您选择最合适的解析器。而且，这实际上取决于在python环境中安装了什么模块。在

根据documentation：

The first argument to the BeautifulSoup constructor is a string or an open filehandle–the markup you want parsed. The second argument is how you’d like the markup parsed.
If you don’t specify anything, you’ll get the best HTML parser that’s installed. Beautiful Soup ranks lxml’s parser as being the best, then html5lib’s, then Python’s built-in parser.

所以，不同的解析器-不同的结果：

>>> from bs4 import BeautifulSoup
>>> url = 'http://www.zap.com.br/imoveis/rio-de-janeiro+rio-de-janeiro/apartamento-padrao/venda/'
>>> import urllib2
>>> response = urllib2.urlopen(url).read()
>>> len(BeautifulSoup(response, 'lxml').find_all('div'))
558
>>> len(BeautifulSoup(response, 'html.parser').find_all('div'))
558
>>> len(BeautifulSoup(response, 'html5lib').find_all('div'))
0

解决方案是指定一个解析器来处理这个特定页面的解析，您可能需要安装^{}或{a3}。在

另请参见：Differences between parsers。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

Beautifulsoup不处理特定的si

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >