获取XMLNode的文本，包括childnodes（或类似的内容）

2条回答

网友

1楼 · 编辑于 2024-06-28 04:58:20

您可以使用minidom解析器。举个例子：

from xml.dom import minidom

def strip_tags(node):
    text = ""
    for child in node.childNodes:
        if child.nodeType == doc.TEXT_NODE:
            text += child.toxml()
        else:
            text += strip_tags(child)
    return text

doc = minidom.parse("<your-xml-file>")

text = strip_tags(doc)

strip\u tags递归函数将浏览xml树并按顺序提取文本。你知道吗

网友

2楼 · 编辑于 2024-06-28 04:58:20

那不是BookTitle节点的text，而是Emphasis节点的tail。所以你应该这样做：

def parse(el):
    text = el.text.strip() + ' ' if el.text.strip() else ''
    for child in el.getchildren():
        text += '{0} {1}\n'.format(child.text.strip(), child.tail.strip())
    return text

这给了你：

>>> root = et.fromstring('''
    <BookTitle>
    <Emphasis Type="Italic">Z</Emphasis>
     = 63 - 100
    </BookTitle>''')
>>> print parse(root)
Z = 63 - 100

以及：

>>> root = et.fromstring('''
<BookTitle>
Mtn
<Emphasis Type="Italic">Z</Emphasis>
 = 74 - 210
</BookTitle>''')
>>> print parse(root)
Mtn Z = 74 - 210

这应该给你一个基本的想法怎么做。你知道吗

更新：修复了空格。。。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

获取XMLNode的文本，包括childnodes（或类似的内容）

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >