擅长:python、mysql、java
<p>您可以制作一个更通用的scraper,搜索所有标记和这些标记中的所有链接。一旦有了所有链接的列表,就可以使用正则表达式或类似表达式来查找与所需结构匹配的链接。在</p>
<pre><code>import requests
from bs4 import BeautifulSoup
import re
response = requests.get('http://www.businessinsider.com')
soup = BeautifulSoup(response.content)
# find all tags
tags = soup.find_all()
links = []
# iterate over all tags and extract links
for tag in tags:
# find all href links
tmp = tag.find_all(href=True)
# append masters links list with each link
map(lambda x: links.append(x['href']) if x['href'] else None, tmp)
# example: filter only careerbuilder links
filter(lambda x: re.search('[w]{3}\.careerbuilder\.com', x), links)
</code></pre>