超链接是如何工作的？

from bs4 import BeautifulSoup html_doc = """ <html><head><title>The Dormouse's story</title></head> <body> The Dormouse's story Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. this is a link: http://www.link.nl/ <a href="http://www.link.nl" title="link title" target="link target" class="link class">link label</a> ... """ soup = BeautifulSoup(html_doc) print soup.hyperlinks

1条回答

网友

1楼 · 发布于 2024-04-19 15:44:59

BeautifulSoup对象没有.hyperlinks属性；也从来没有这样的东西。你知道吗

相反，BeautifulSoup无法识别的任何属性访问都会变成对.find()的调用。soup.hyperlinks被解释为soup.find('hyperlinks')，搜索第一个<hyperlinks>HTML元素。因为没有这样的标记，所以返回None。你知道吗

要查找HTML文档中的所有超链接，只需遍历所有a标记，仅限于具有href属性的标记：

print soup.find_all('a', href=True)

演示：

>>> soup.find_all('a', href=True)
[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>, <a class="link class" href="http://www.link.nl" target="link target" title="link title">link label</a>]

您还可以获取所有href属性：

>>> [l['href'] for l in soup.find_all('a', href=True)]
[u'http://example.com/elsie', u'http://example.com/lacie', u'http://example.com/tillie', u'http://www.link.nl']

相关问题更多 >

编程相关推荐

热门问题

热门文章