我试图找到一个相对的Xpath(不是绝对的Xpath),它允许我从这个url提取数据:https://www.sec.gov/Archives/edgar/data/1000228/000100022810000006/the10k_2009.htm
我的代码在下面。SalesB返回一个值('233715'),但SalesA返回空值。我做错什么了?你知道吗
from lxml import html
import requests
SEC_pageA = requests.get('https://www.sec.gov/Archives/edgar/data/1000228/000100022810000006/the10k_2009.htm')
SEC_treeA = html.fromstring(SEC_pageA.content)
SalesA = SEC_treeA.xpath('(//p[contains(., "CONSOLIDATED STATEMENTS OF INCOME")]/following::td[contains(.,"Net sales")]/following-sibling::td[@align="right"]//text())[1]')
SEC_pageB = requests.get('https://www.sec.gov/Archives/edgar/data/320193/000119312515356351/d17062d10k.htm')
SEC_treeB = html.fromstring(SEC_pageB.content)
SalesB = SEC_treeB.xpath('(//p[contains(., "CONSOLIDATED STATEMENTS OF OPERATIONS")]/following::td[contains(.,"Net sales")]/following-sibling::td[@align="right"]//text())[1]')
print SalesA
print SalesB
SalesB返回如下所示的值,该值可以通过secu pageA变量找到(参见https://www.sec.gov/Archives/edgar/data/320193/000119312515356351/d17062d10k.htm)。你知道吗
我希望SalesA返回“净销售额”的数字,可以在下面看到(即6538336),在这里找到:https://www.sec.gov/Archives/edgar/data/1000228/000100022810000006/the10k_2009.htm
这是因为有些文本不在一行中,因为xpath无法找到您真正想要的内容。你知道吗
印刷品
相关问题 更多 >
编程相关推荐