解析根元素中元素之间的XML文本

2条回答

网友

1楼 · 编辑于 2024-10-01 07:11:04

Question: I would like to extract the text in a way that "a": {"aaaa1", "aaaa2", "aaaa3"}, "b": {"bbbb"}, "c": {"cccc"}.

Note: If there are more than one tag <b> or <c> within xml, you have to handle this using a condition!

import lxml.etree as etree

xml = '<a>aaaa1<b>bbbb</b>aaaa2<c>cccc</c>aaaa3</a>'

# Parse xml to tree
tree = etree.fromstring(xml)
#root = tree.getroot()

# In this example, the first tag is the root Element
root = tree.tag

# Init result dict with this first Element tag:[text]
result = {tree.tag:[tree.text]}

# Loop every Element in the tree
for element in tree:
    # Add this element to result tag:text
    result.setdefault(element.tag, element.text)

    # If this element has a .tail, append it to the root:[]
    if element.tail:
        result[root].append(element.tail)

print("result:{}".format(result))
>>>result:{'c': 'cccc', 'b': 'bbbb', 'a': ['aaaa1', 'aaaa2', 'aaaa3']}

用Python:3.5测试

网友
2楼 · 编辑于 2024-10-01 07:11:04

可以创建一个函数来收集作为给定父元素的直接子元素的文本节点：
def read_element(e): return {e.tag: [t.strip() for t in e.xpath("text()")]}
然后对XML中的每个元素调用该函数并以所需的格式打印结果，例如：
^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

解析根元素中元素之间的XML文本

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >