我正在尝试从this website获取下一页的所有链接。我下面的脚本可以解析下一页的链接,直到10点。但是,我无法通过该页面底部显示为10的链接
我试过:
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
base = 'https://www.icab.es'
link = 'https://www.icab.es/?go=eaf9d1a0ec5f1dc58757ad6cffdacedb1a58854a600312cc82c494d2c55856f1e25c06b4b6fcee5ddabebfe2d30057589a86e9750b459e9d60598cc6e5c52a4697030b2b8921f29f'
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
p = 1
while True:
r = s.get(link)
soup = BeautifulSoup(r.text,"lxml")
"""some data I can fetch myself from current pages, so ignore this portion"""
p+=1
next_page = soup.select_one(f"a[title='{p}']")
if next_page:
link = urljoin(base,next_page.get("href"))
print("next page:",link)
else:
break
如何从上述网站获取所有下一页链接?
PS selenium不是我想处理的选项
当您的
(p-1)%10 != 0
代码:
结果(
page >>
可视为page ?1
):我的SSL有问题,因此我更改了此站点的默认SSL_上下文:
印刷品:
相关问题 更多 >
编程相关推荐