用Python解析Alexa-XML

2024-09-27 00:23:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个非常相似的问题: python alexa result parsing with lxml.etree。在

我想知道如何解析第二个DataUrl。这意味着我要得到DataUrl变量,它在TrafficData下,而不是{}下。(得到people.com而不是google.com

我还使用了lxml,数据与他描述的完全相同。在

代码如下:

<aws:UrlInfoResponse xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/">
  <aws:Response xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11">
    <aws:OperationRequest>
      <aws:RequestId>ccf3f263-ab76-ab63-db99-244666044e85</aws:RequestId>
    </aws:OperationRequest>
    <aws:UrlInfoResult>
      <aws:Alexa>
        <aws:ContentData>
          <aws:DataUrl type="canonical">google.com/</aws:DataUrl>
          <aws:SiteData>
        <aws:Title>Google</aws:Title>
            <aws:Description>Enables users to search the world's information, including webpages, images, and videos. Offers unique features and search technology.</aws:Description>
            <aws:OnlineSince>15-Sep-1997</aws:OnlineSince>
          </aws:SiteData>
          <aws:LinksInCount>3453627</aws:LinksInCount>
        </aws:ContentData>
        <aws:TrafficData>
          <aws:DataUrl type="canonical">people.com/</aws:DataUrl>
          <aws:Rank>1</aws:Rank>
        </aws:TrafficData>
      </aws:Alexa>
    </aws:UrlInfoResult>
    <aws:ResponseStatus xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/">
      <aws:StatusCode>Success</aws:StatusCode>
    </aws:ResponseStatus>
  </aws:Response>
</aws:UrlInfoResponse>

Tags: comawshttpdocresponsegooglepeoplelxml
2条回答

我需要做的是:

namespaces = {"aws": "http://awis.amazonaws.com/doc/2005-07-11"}
texts = doc.xpath("//aws:TrafficData/aws:DataUrl/text()", namespaces=namespaces)
print texts[0]

我把答案作为一个整体进行了编辑

为了回复您的评论,您只需更改xpath

下面的工作示例(来自链接的问题)返回google.com/

from lxml import etree

xmlstr = """
<?xml version="1.0"?>
<aws:UrlInfoResponse xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/">
  <aws:Response xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11">
    <aws:OperationRequest>
      <aws:RequestId>ccf3f263-ab76-ab63-db99-244666044e85</aws:RequestId>
    </aws:OperationRequest>
    <aws:UrlInfoResult>
      <aws:Alexa>
        <aws:ContentData>
          <aws:DataUrl type="canonical">google.com/</aws:DataUrl>
          <aws:SiteData>
            <aws:Title>Google</aws:Title>
            <aws:Description>Enables users to search the world's information, including webpages, images, and videos. Offers unique features and search technology.</aws:Description>
            <aws:OnlineSince>15-Sep-1997</aws:OnlineSince>
          </aws:SiteData>
          <aws:LinksInCount>3453627</aws:LinksInCount>
        </aws:ContentData>
        <aws:TrafficData>
          <aws:DataUrl type="canonical">googly.com/</aws:DataUrl>
          <aws:Rank>1</aws:Rank>
        </aws:TrafficData>
      </aws:Alexa>
    </aws:UrlInfoResult>
    <aws:ResponseStatus xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/">
      <aws:StatusCode>Success</aws:StatusCode>
    </aws:ResponseStatus>
  </aws:Response>
</aws:UrlInfoResponse>
"""

doc = etree.fromstring(xmlstr.strip())


namespaces = {"aws": "http://awis.amazonaws.com/doc/2005-07-11"}
texts = doc.xpath("//aws:TrafficData/aws:DataUrl/text()", namespaces=namespaces)
print texts[0]

相关问题 更多 >

    热门问题