BeautifulSoup中的findAll（）跳过多个ID

<img id="webfast-uhyubv" alt="" data-type="image" id="comp-jefxldtzbalatamediacontentimage" src="http://webfast.co/images/webfast-logo.png" /> soup = bs4.BeautifulSoup(webpage,"html.parser") images = soup.findAll('img') for image in images: print image

1条回答

网友

1楼 · 发布于 2024-05-20 21:00:21

BeautifulSoup存储attributes of a tag in a dictionary。因为字典不能有重复的键，一个id属性会覆盖另一个。您可以使用tag.attrs检查属性字典。你知道吗

>>> soup = BeautifulSoup(tag, 'html.parser')
>>> soup.img.attrs
{'id': 'comp-jefxldtzbalatamediacontentimage', 'alt': '', 'data-type': 'image', 'src': 'http://webfast.co/images/webfast-logo.png'}

>>> soup = BeautifulSoup(tag, 'lxml')
>>> soup.img.attrs
{'id': 'webfast-uhyubv', 'alt': '', 'data-type': 'image', 'src': 'http://webfast.co/images/webfast-logo.png'}

如您所见，我们使用不同的解析器获得id的不同值。这发生在different parsers work differently。你知道吗

使用BeautifulSoup无法同时获取id值。你可以用正则表达式得到它们。但是，use it carefully and as a last resort!

>>> import re
>>> tag = '<img id="webfast-uhyubv" alt="" data-type="image" id="comp-jefxldtzbalatamediacontentimage" src="http://webfast.co/images/webfast-logo.png" />'
>>> ids = re.findall('id="(.*?)"', tag)
>>> ids
['webfast-uhyubv', 'comp-jefxldtzbalatamediacontentimage']

相关问题更多 >

编程相关推荐

热门问题

热门文章

BeautifulSoup中的findAll（）跳过多个ID

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >