用python抓取网站时的最大页数

2条回答

网友

1楼 · 编辑于 2024-09-30 16:38:36

For循环很好，但不能总是使用它们。在这种情况下，我只需重复按“下一页”按钮中的链接，直到没有这样的按钮为止。像这样：

url = <first page>
while True:
    # extract data
    if <there is a next page button>:
        url = <href of the button>
    else:
        break

网友

2楼 · 编辑于 2024-09-30 16:38:36

这将获取所有页面，为每个页面生成一个beautifulGroup对象，指向下一页的链接位于锚定标记中，类前进：

import requests
from urlparse import urljoin


def get_pages(base, url):
    soup = BeautifulSoup(requests.get(url).content)
    yield soup
    next_page = soup.select_one("a.forward")
    for page in iter(lambda: next_page, None):
        soup = BeautifulSoup(requests.get(urljoin(base, page["href"])).content)
        yield soup
        next_page = soup.select_one("a.forward")



for soup in get_pages("https://www.xrel.to/", "https://www.xrel.to/games-release-list.html?archive=2016-01"):
    print(soup)

相关问题更多 >

编程相关推荐

热门问题

热门文章

用python抓取网站时的最大页数

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >