如何收集网页的全部来源（来源只显示前10个X。）

#Imports import requests from bs4 import BeautifulSoup import re #Start of code r = requests.get('http://www.tumblr.com/tagged/skateboard') page = r.content soup = BeautifulSoup(page) soup.prettify() arrayDiv = [] for anchor in soup.findAll("div", { "class" : "post_info" }): anchor = str(anchor) tempString = anchor.replace('</a>:', '') tempString = tempString.replace('<div class="post_info">', '') tempString = tempString.replace('</div>', '') tempString = tempString.split('>') newString = tempString[1] newString = newString.strip() arrayDiv.append(newString) print arrayDiv

1条回答

网友

1楼 · 发布于 2024-10-02 22:38:21

我用beauthulsoup解决了一个类似的问题。我所做的就是循环浏览页面。用beautifulsoup检查是否有一个continue元素-这里（在不倒翁页面中）例如，这是一个id为“next_page_link”的元素如果有一个，我会循环照片抓取代码，同时更改由请求获取的url。当然，您需要将所有代码封装在一个函数中

祝你好运。在

相关问题更多 >

编程相关推荐

热门问题

热门文章