在python中，使用相对xpath查找给定tex之后的第一个表

from lxml import html import requests ResultsPage = requests.get('https://www.iaaf.org/competitions/iaaf-world-championships/iaaf-world-championships-london-2017-5151/results/men/10000-metres/final/result') ResultsTree = html.fromstring(ResultsPage.content) ResultsTable = ResultsTree.xpath(("""//*[text()[contains(normalize-space(), "SPLIT TIMES")]]""")) print ResultsTable

2条回答

网友

1楼 · 编辑于 2024-06-25 05:25:34

您的代码有问题，因为您尝试删除的网站使用了一个将拒绝请求的保护（如另一个答案中指出的，标头中缺少用户代理）：

The request could not be satisfied. Request blocked. Generated by cloudfront (CloudFront)

我可以通过使用这个库来绕过这个问题：cloudflare-scrape。你知道吗

可以使用pip安装：

pip install cfscrape

下面是一段代码，其中包含一个有效的xpath，用于实现您想要实现的目标，诀窍是使用文档中描述的“following”axe:https://www.w3.org/TR/xpath/#axes。你知道吗

import cfscrape
from lxml import html

scraper = cfscrape.create_scraper()
page = scraper.get('https://www.iaaf.org/competitions/iaaf-world-championships/iaaf-world-championships-london-2017-5151/results/men/10000-metres/final/result')
tree = html.fromstring(page.content)
table = tree.xpath(".//h2[contains(text(), 'Split times')][1]/following::table[1]")

网友

2楼 · 编辑于 2024-06-25 05:25:34

可以通过xpath使用following，如下所示。你知道吗

relative_string = "Split times"

ResultsTable = ResultsTree.xpath("//*[text()[contains(normalize-space(), '"+relative_string+"')]]/following::table")

相关问题更多 >

编程相关推荐

热门问题

热门文章