使用BeautifulSoup进行网页抓取时，如何移动到新页面？问题的回答

使用BeautifulSoup进行网页抓取时，如何移动到新页面？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

对于每个抓取的页面，您可以找到下一个要爬网的url并将其添加到列表中。在 我就是这样做的，不需要对你的代码做太多修改。我添加了一些评论，以便您了解发生了什么，但如果您需要任何其他解释，请给我留言： <pre><code>import requests from urllib.request import urlopen import pandas as pd from bs4 import BeautifulSoup base_url = 'https://nh.craigslist.org/d/computer-parts/search/syp' base_search_url = 'https://nh.craigslist.org' urls = [] urls.append(base_url) dates = [] titles = [] prices = [] hoods = [] while len(urls) > 0: # while we have urls to crawl print(urls) url = urls.pop(0) # removes the first element from the list of urls response = requests.get(url) soup = BeautifulSoup(response.text,"lxml") next_url = soup.find('a', class_= "button next") # finds the next urls to crawl if next_url: # if it's not an empty string urls.append(base_search_url + next_url['href']) # adds next url to crawl to the list of urls to crawl listings = soup.find_all('li', class_= "result-row") # get all current url listings # this is your code unchanged for listing in listings: datar = listing.find('time', {'class': ["result-date"]}).text dates.append(datar) title = listing.find('a', {'class': ["result-title"]}).text titles.append(title) try: price = listing.find('span', {'class': "result-price"}).text prices.append(price) except: prices.append('missing') try: hood = listing.find('span', {'class': "result-hood"}).text hoods.append(hood) except: hoods.append('missing') #write the lists to a dataframe listings_df = pd.DataFrame({'Date': dates, 'Titles' : titles, 'Price' : prices, 'Location' : hoods}) #write to a file listings_df.to_csv("craigslist_listings.csv") </code></pre> 编辑：您还忘记在代码中导入<code>BeautifulSoup</code>，我在我的回复中添加了这一点 Edit2:您只需要找到“下一步”按钮的第一个实例，因为页面可以（在本例中确实）有多个“下一步”按钮。 Edit3:若要对此计算机部件进行爬网，<code>base_url</code>应更改为此代码中的一个

使用BeautifulSoup进行网页抓取时，如何移动到新页面？

1 个回答

相关Python问题