擅长:python、mysql、java
<p>请确保输入是格式正确的xml。在</p>
<p>要处理内存有限的大型xml文件,可以使用<a href="http://docs.python.org/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse" rel="nofollow">^{<cd1>}</a>:</p>
<pre><code>#!/usr/bin/env python
import xml.etree.cElementTree as etree
def getelements(source, tag):
context = iter(etree.iterparse(source, events=('start', 'end')))
_, root = next(context) # get root element
for event, elem in context:
if event == 'end' and elem.tag == tag:
yield elem
root.clear() # free memory
for elem in getelements('big.xml', 'person'):
print '^'.join(elem.find(tag).text for tag in 'name age address'.split())
</code></pre>
<p>您可以为标记内的多行文本添加特殊处理(如您的示例<code>address</code>),例如,您可以使用<code>re.sub(r'\s+', ' ', text)</code>来规范化空白。在</p>
<h3><a href="http://ideone.com/h4B6X" rel="nofollow">Output</a></h3>
^{pr2}$
<p>如果输入xml可能包含<code>'^'</code>,并且希望对其进行转义,那么可以使用<a href="http://docs.python.org/library/csv#csv.writer" rel="nofollow">^{<cd5>} module</a>生成输出。在</p>