使用Python ElementT迭代多个（父、子）节点

1条回答

网友
1楼 · 发布于 2024-10-01 13:28:55

考虑一下：
>>> xml = """<Content> ... <Para>first</Para> ... <Table><Para>second</Para></Table> ... <Para>third</Para> ... </Content>""" >>> import xml.etree.cElementTree as et >>> page = et.fromstring(xml) >>> for p in page.getiterator(): ... print "ppp", p.tag, repr(p.text) ... for c in p: ... print "ccc", c.tag, repr(c.text), p.tag ... ppp Content '\n ' ccc Para 'first' Content ccc Table None Content ccc Para 'third' Content ppp Para 'first' ppp Table None ccc Para 'second' Table ppp Para 'second' ppp Para 'third' >>>
旁白：列表理解是非常棒的，直到你想清楚什么是被迭代的：-）
getiterator是按广告顺序生成“ppp”元素的。然而，您正在从附属“ccc”元素中提取感兴趣的元素，这些元素不符合您所需的顺序。
一种解决方案是进行自己的迭代：
>>> def process(elem, parent): ... print elem.tag, repr(elem.text), parent.tag if parent is not None else None ... for child in elem: ... process(child, elem) ... >>> process(page, None) Content '\n ' None Para 'first' Content Table None Content Para 'second' Table Para 'third' Content >>>
现在，您可以在每个“Para”元素经过时，都引用其父元素（如果有的话）。
这可以很好地包装在发电机小工具中：
>>> def iterate_with_parent(elem): ... stack = [] ... while 1: ... for child in reversed(elem): ... stack.append((child, elem)) ... if not stack: return ... elem, parent = stack.pop() ... yield elem, parent ... >>> >>> showtag = lambda e: e.tag if e is not None else None >>> showtext = lambda e: repr((e.text or '').rstrip()) >>> for e, p in iterate_with_parent(page): ... print e.tag, showtext(e), showtag(p) ... Para 'first' Content Table '' Content Para 'second' Table Para 'third' Content >>>

相关问题更多 >

编程相关推荐

热门问题

热门文章