<p>具有<code>lxml.etree</code>、<code>OrderdedDict</code>和<code>pandas</code>库的扩展解决方案:</p>
<p>我们首先需要修复一个格式错误的XML内容:主要思想是用XML<em>名称空间</em>(<code>ns1</code>)添加<code>root</code>标记。出于演示目的,输入<em>xml</em>(取自问题原样)被解析为一个字符串,需要进一步修改。你知道吗</p>
<pre><code>from lxml import etree
import pandas as pd
import sys
from collections import OrderedDict
xml_content = '<root xmlns:ns1="http://base.google.com/ns/1.0">{}</root>'\
.format(open('base.xml').read())
doc = etree.fromstring(xml_content)
ns = {'ns1': 'http://base.google.com/ns/1.0'}
records = []
for block in doc.findall('ns1:infoTable', namespaces=ns):
d = OrderedDict()
for el in block.getchildren():
el_tag = el.tag.replace("{{{}}}".format(ns['ns1']), '')
inner_childs = el.getchildren()
if inner_childs: # if element has child nodes
prefix = 'va' if el_tag == 'votingAuthority' else ''
d.update({prefix + child.tag.replace("{{{}}}".format(ns['ns1']), ''): child.text
for child in inner_childs})
else:
d[el_tag] = el.text
records.append(d)
df = pd.DataFrame(records)
print(df.to_string(index=False, justify=True))
</code></pre>
<p>输出:</p>
<pre><code>nameOfIssuer titleOfClass cusip value sshPrnamt sshPrnamtType putCall investmentDiscretion otherManager vaSole vaShared vaNone
COMPANYFOUR COM 00004 67 36100 SH Call DFND 01, 02 36100 0 0
COMPANYFIVE SPONSORED ADS A 00005 2695 339367 SH NaN DFND 01, 02 339367 0 0
</code></pre>
<p>要使用所需分隔符将结果保存到csv文件中,请使用<code>df.to_csv()</code>例程:</p>
<pre><code>df.to_csv(path_or_buf='output.csv', sep='\t', index=False)
</code></pre>