如何从本身就是超链接的href中获取URL？

from lxml import html import requests chapter_req = requests.get('https://www.math.wisc.edu/~mstemper2/Math/Pinter/Chapter02') chapter_html = html.fromstring(chapter_req.content) sections = chapter_html.xpath('//ol[@id="ProbList"]/li/a/@href') print(sections[0])

2条回答

网友

1楼 · 编辑于 2024-09-30 19:21:31

您还可以直接在XPATH级别进行连接，从相对链接重新生成URL：

from lxml import html
import requests

chapter_req = requests.get('https://www.math.wisc.edu/~mstemper2/Math/Pinter/Chapter02')
chapter_html = html.fromstring(chapter_req.content)
sections = chapter_html.xpath('concat("https://www.math.wisc.edu/~mstemper2/Math/Pinter/",//ol[@id="ProbList"]/li/a/@href)')
print(sections)

输出：

https://www.math.wisc.edu/~mstemper2/Math/Pinter/Chapter02A

网友

2楼 · 编辑于 2024-09-30 19:21:31

您看到的返回是正确的，因为Chapter02a是指向下一节的“相对”链接。完整的url没有列出，因为它不是以这种方式存储在html中的。你知道吗

要获取可使用的完整URL，请执行以下操作：

url_base = 'https://www.math.wisc.edu/~mstemper2/Math/Pinter/'
sections = chapter_html.xpath('//ol[@id="ProbList"]/li/a/@href')
section_urls = [url_base + s for s in sections]

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何从本身就是超链接的href中获取URL？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >