擅长:python、mysql、java
<p>这将获取所有页面,为每个页面生成一个beautifulGroup对象,指向下一页的链接位于锚定标记中,类<em>前进</em>:</p>
<pre><code>import requests
from urlparse import urljoin
def get_pages(base, url):
soup = BeautifulSoup(requests.get(url).content)
yield soup
next_page = soup.select_one("a.forward")
for page in iter(lambda: next_page, None):
soup = BeautifulSoup(requests.get(urljoin(base, page["href"])).content)
yield soup
next_page = soup.select_one("a.forward")
for soup in get_pages("https://www.xrel.to/", "https://www.xrel.to/games-release-list.html?archive=2016-01"):
print(soup)
</code></pre>