从HTML文档获取XPath

from lxml import html import requests page = requests.get('https://next.newsimpact.com/NewsWidget/Live') tree = html.fromstring(page.content) #This will create a list of buyers: value = tree.xpath('//*[@id="table9521"]/tr[1]/td[4]/text()') print('Value: ', value)

2条回答

网友

1楼 · 编辑于 2024-10-01 04:50:00

初始页面源中缺少必需的数据-它来自XHR。您可以得到如下信息：

import requests

response = requests.get('https://next.newsimpact.com/NewsWidget/GetNextEvents?offset=-120').json()

first_previous = response['Items'][0]['Previous']  # Current output - "2.632"
second_previous = response['Items'][1]['Previous']  # Currently - "0.2"
first_forecast = response['Items'][0]['Forecast']  # ""
second_forecast = response['Items'][1]['Forecast']  # "0.3"

您可以将response解析为简单的Python dict并获得所有必需的数据

网友

2楼 · 编辑于 2024-10-01 04:50:00

你的问题很简单，request根本不处理javascript。值是JS生成的！在

如果您真的需要运行这个xpath，那么您需要使用一个能够理解JS的模块，比如spynner。在

您可以通过首先使用curl或在浏览器中禁用JS来测试何时需要JS。使用firefox:about:config在导航栏中，然后搜索javascript.enabled，然后双击它在真或假之间切换

在chrome，打开chrome dev tools，在某处有一个选项。在

检查https://github.com/makinacorpus/spynner

另一个（可能的）问题是，使用tree = html.fromstring(page.text)而不是tree = html.fromstring(page.content)

相关问题更多 >

编程相关推荐

热门问题

热门文章