如何在XML树中保存节点的位置以供以后使用?

2024-06-26 00:13:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我已经解析了一个XML树,并使用<lastmod>节点获得了最后添加的<url>节点。如何“保存”树中的节点位置并使用它来获取它所属的<url>中的其他节点

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
  <url>
    <loc>https://www.website.com/</loc>
    <changefreq>daily</changefreq>
  </url>
  <url>
    <loc>https://www.website.com/location/</loc>
    <lastmod>2016-10-13T06:03:41Z</lastmod>
    <changefreq>daily</changefreq>
    <image:image>
      <image:loc>https://website.com/image/</image:loc>
      <image:title>Title of Item</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://www.website.com/location/</loc>
    <lastmod>2016-09-15T07:11:22Z</lastmod>
    <changefreq>daily</changefreq>
    <image:image>
      <image:loc>https://website.com/image/</image:loc>
      <image:title>Title of Item</image:title>
    </image:image>
  </url>
</urlset>

第一个<url>标记是基于两个<url>标记对XML文档的最新添加。不过,您必须遍历整个XML文档才能找到答案。如何保存XML标记的“位置”以便稍后获得<image:title>?这是我的密码:

tree = get_xml_data(line)
        jul_newest = 0.0  # establish a comparison value for the newest addition
        for child in tree:
            if child.tag.endswith("url"):
                for c in child:
                    if c.tag.endswith("lastmod"):
                        xml_date = c.text
                        year = float(xml_date[0:4])
                        month = float(xml_date[5:7])
                        day = float(xml_date[8:10])
                        hour = float(xml_date[11:13])
                        minute = float(xml_date[14:16])
                        second = float(xml_date[17:19])
                        # calculate Julian day number of recent addition
                        jul_day = julian(year, month, day, hour, minute, second)
                        if jul_day > jul_newest:
                            nt.set_year(int(year))
                            nt.set_month(int(month))
                            nt.set_day(int(day))
                            nt.set_hour(int(hour))
                            nt.set_minute(int(minute))
                            nt.set_second(int(second))
                            jul_newest = jul_day
                            nt.set_jul(jul_day)
        # find loc of the latest addition
        for child in tree:
            if child.tag.endswith("url"):
                for c in child:
                    if c.tag.endswith("loc"):
                        nt.set_location(c.text)

Tags: imagecomchildurldatexmlfloatloc