擅长:python、mysql、java
<p>分页的url模式始终与此站点一致,因此不需要请求获取页面url。相反,您可以解析按钮中显示“第1页,共10页”的文本,并在知道最终页码后构建页面URL。你知道吗</p>
<pre><code>import re
import requests
from bs4 import BeautifulSoup
thread_url = "http://forum.pcgames.de/stellt-euch-vor/9331721-update-im-out-bitches.html"
r = requests.get(thread_url)
soup = BeautifulSoup(r.content, 'lxml')
pattern = re.compile(r'Seite\s\d+\svon\s(\d+)', re.I)
pages = soup.find('a', text=pattern).text.strip()
pages = int(pattern.match(pages).group(1))
page_urls = [f"{thread_url[:-5]}-{p}.html" for p in range(1, pages + 1)]
for url in page_urls:
print(url)
</code></pre>