在Python中使用xpath和lxml获取数据时出现问题

import lxml.etree from google.appengine.api import urlfetch def foo(): url = 'http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData$filter=month(NEW_DATE)%20eq%201%20and%20year(NEW_DATE)%20eq%202015' response = urlfetch.fetch(url) tree = lxml.etree.fromstring(response.content) nsmap = {'atom': 'http://www.w3.org/2005/Atom', 'd': 'http://schemas.microsoft.com/ado/2007/08/dataservices'} myData = tree.xpath("//atom:entry[last()]/d:BC_1YEAR", namespaces=nsmap)

1条回答

网友

1楼 · 发布于 2024-04-28 08:09:25

据我所知，正确的网址是

http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData?$filter=month%28NEW_DATE%29%20eq%201%20and%20year%28NEW_DATE%29%20eq%202015

而在您的代码中，Data之后没有?）

http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData$filter=month%28NEW_DATE%29%20eq%201%20and%20year%28NEW_DATE%29%20eq%202015

这就是代码当前生成以下XML的原因：

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<error xmlns="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata">
  <code></code>
  <message xml:lang="en-US">Resource not found for the segment 'DailyTreasuryYieldCurveRateData$filter=month'.</message>
</error>

当然，错误消息中没有atom:entry。你知道吗

此外，XPath表达式：

//atom:entry[last()]/d:BC_1YEAR

不会检索d:BC_1YEAR，因为d:BC_1YEAR不是atom:entry的直接子级。使用

//atom:entry[last()]//d:BC_1YEAR

或者，更好的是，在代码中注册m:前缀并使用

//atom:entry[last()]/atom:content/m:properties/d:BC_1YEAR

import lxml.etree
from google.appengine.api import urlfetch

def foo():
    url = 'http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData?$filter=month%28NEW_DATE%29%20eq%201%20and%20year%28NEW_DATE%29%20eq%202015'
    response = urlfetch.fetch(url)
    tree = lxml.etree.fromstring(response.content)
    nsmap = {'atom': 'http://www.w3.org/2005/Atom',
             'd': 'http://schemas.microsoft.com/ado/2007/08/dataservices',
             'm': 'http://schemas.microsoft.com/ado/2007/08/dataservices/metadata'}
    myData = tree.xpath("//atom:entry[last()]/atom:content/m:properties/d:BC_1YEAR", namespaces=nsmap)

编辑：作为对您评论的回应：

I want my code to work 'indefinitely' with as little maintenance as possible. I don't know what the purpose of namespaces really are and I wonder if these particular namespaces are generic and can be expected to stay that way for years to come?

我已经explained the purpose of namespaces in XML elsewhere-请也看看这个答案。名称空间从来不是泛型的，事实上，它们与泛型完全相反-它们应该是唯一的。你知道吗

也就是说，有一些方法可以忽略名称空间。像一个表情

//atom:entry[last()]//d:BC_1YEAR

可以重写为

//*[local-name() = 'entry'][last()]//*[local-name() = 'BC_1YEAR']

查找元素而不考虑其命名空间。如果您有理由相信名称空间uri将来会发生变化，那么这将是一个选项。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章