从具有相同类的多个<ul>获取链接

# import libraries from urllib.request import Request, urlopen from bs4 import BeautifulSoup """Getting Started Example for Python 2.7+/3.3+""" chapter = 1 chapterlist = 1 links = [] name = "" reallink = "" while chapter < 31: quote_page = Request('http://website.com/page.html?page=' + str(chapter) + '&per-page=50', headers={'User-Agent': 'Mosezilla/5.0'}) page = urlopen(quote_page).read() soup = BeautifulSoup(page, "html.parser") name_box = soup.find("ul", attrs={"class": "list-chapter"}) links += name_box.find_all("a") reallink += str([a['href'] for a in links]) chapter += 1 f = open("links.txt", "w+") i = 1 f.write(reallink) f.close()

1条回答

网友

1楼 · 发布于 2024-09-30 18:20:22

您使用的find将返回第一个匹配项，而find_all将返回匹配项列表。你知道吗

假设您的ul类是正确的，我将使用select代替，并收集这些类的子a标记：

替换这些行：

name_box = soup.find("ul", attrs={"class": "list-chapter"})
links += name_box.find_all("a")
reallink += str([a['href'] for a in links])

与

realinks = ['http://www.example.com' + item['href'] for item in soup.select('ul.list-chapter a')] #I'm assuming href already has leading /

相关问题更多 >

编程相关推荐

热门问题

热门文章