xml标记的递归遍历不打印(子标记)Ref.tai的值

2024-10-01 15:28:51 发布

您现在位置:Python中文网/ 问答频道 /正文

Xml文档

我们有以下xml文件:

<?xml version="1.0" encoding="utf-8"?>
<doc id="ENG_DF_000170_20150219_F0010008Z">
<post id="p10" author="Kosh" datetime="2015-02-19T21:33:00">
<quote orig_author="Luddly Neddite">
<quote orig_author="zeke">
<quote orig_author="Luddly Neddite">
<quote orig_author="occupied">
Don't forget the fucking Moonies.
</quote>
The Bushes have middle east oil money behind them. They are owned by such as the bin Laden's and Saudi Prince Alwaleed bin Talal.
That's in addition to the Koch/Adelson openly buying elections.
</quote>
I think the Repubs have a brilliant strategy by running Bush 3. And Clinton 2.
It will allow the hyper partisans on both sides to make the decision as to who will be president.
Because people like me will just say fuck it to voting. If these two represent the very best that America has to offer in the form of leadership, we are royally and truly fucked.
And I am done voting. Not that my vote means much anyway.
</quote>
It's being reported that of the 21 people reportedly advising Jeb Bush, 19 are veterans of the first Bush administration, the second Bush administration, or in a few cases, both. 
Some of the more notable names are Secretary of State (James Baker), his brother’s Deputy Defense Secretary (Paul Wolfowitz), his brother’s National Security Adviser (Stephen Hadley), 
a variety of members from his brother’s cabinet (Tom Ridge and Michael Chertoff).
</quote>
So why does the far left care? None of you far left drones will vote for him anyway, so what difference does it make?
</post>
</doc>

源代码

我们要找到标签post。然后递归遍历标记quote,并打印<quote> </quote>之间的文本

我们使用了下面的python代码。我们可以调用函数findall('.//quote')来递归地检索标签

#! /usr/bin/python
# -*- coding: utf-8 -*-
import re, sys, io
import xml.etree.ElementTree as ET
import os

def search_for_query(path):
  paragraphs = ""
  tree = ET.parse(path)
  root = tree.getroot()
  for i in range(0,len(root)):
    #retrieve data from post
    if root[i].tag == "post":
      #recursively retrieve quote
      quotes = root[i].findall('.//quote')
      for quote in quotes:
         print quote.get("orig_author")
         print quote.text


if __name__ == "__main__": 
  queries_xml = sys.argv[1]
  search_for_query(queries_xml)

问题

问题是它跳过了除第一个文本以外的所有文本:

Luddly Neddite


zeke


Luddly Neddite


occupied

Don't forget the fucking Moonies.

我想我弄错了/ 定义是

Element.findall() finds only elements with a tag which are direct children of the current element

所以是的,我不是在调查报价的子元素


Tags: ofthetoinforxmlpostwill
1条回答
网友
1楼 · 发布于 2024-10-01 15:28:51

因为每个元素中只有第一个文本节点会存储为元素的text。前面有其他子元素的文本节点将存储为相应子元素的tail。可以使用以下逻辑获取给定父元素的所有直接子文本节点。它只需将第一个文本节点与所有子元素(如果有的话)的tail组合起来:

def get_text(element):
    return element.text + \
        ''.join(c.tail for c in element.findall('*') if c.tail is not None)

快速测试:

>>> for i in range(0,len(root)):
...     #retrieve data from post
...     if root[i].tag == "post":
...         #recursively retrieve quote
...         quotes = root[i].findall('.//quote')
...         for quote in quotes:
...             print quote.get("orig_author")
...             print get_text(quote)
... 
Luddly Neddite

       It's being reported that of the 21 people reportedly advising Jeb Bush, 19 are veterans of the first Bush administration, the second Bush administration, or in a few cases, both. Some of the more notable names are Secretary of State (James Baker), his brother’s Deputy Defense Secretary (Paul Wolfowitz), his brother’s National Security Adviser (Stephen Hadley), a variety of members from his brother’s cabinet (Tom Ridge and Michael Chertoff).

zeke

         I think the Repubs have a brilliant strategy by running Bush 3. And Clinton 2. It will allow the hyper partisans on both sides to make the decision as to who will be president. Because people like me will just say fuck it to voting. If these two represent the very best that America has to offer in the form of leadership, we are royally and truly fucked. And I am done voting. Not that my vote means much anyway.

Luddly Neddite

           The Bushes have middle east oil money behind them. They are owned by such as the bin Laden's and Saudi Prince Alwaleed bin Talal. That's in addition to the Koch/Adelson openly buying elections.

occupied
Don't forget the fucking Moonies.

相关问题 更多 >

    热门问题