回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我有下面的输入文件,在一个文件中多达10万条记录</p>
<pre><code><pain001><CstmrCdtTrfInitn><GrpHdr><MsgId>ABC/120928/CCT001</MsgId><CreDtTm>2012-09-28T14:07:00</CreDtTm><NbOfTxs>100000</NbOfTxs><CtrlSum>11500000</CtrlSum> <InitgPty><Nm>ABC Corporation</Nm><PstlAdr><StrtNm>Times Square</StrtNm><BldgNb>7</BldgNb><PstCd>NY 10036</PstCd><TwnNm>New York</TwnNm><Ctry>US</Ctry></PstlAdr></InitgPty></GrpHdr><PmtInf><PmtInfId>CARCORP/086</PmtInfId><PmtMtd>TRF</PmtMtd><BtchBookg>false</BtchBookg><ReqdExctnDt>2012-09-29</ReqdExctnDt><Dbtr><Nm>CARCORP INC</Nm><PstlAdr><StrtNm>Times Square</StrtNm><BldgNb>7</BldgNb><PstCd>NY 10036</PstCd><TwnNm>New York</TwnNm><Ctry>US</Ctry></PstlAdr></Dbtr><DbtrAcct><Id><Othr><Id>00125574999</Id></Othr></Id></DbtrAcct><DbtrAgt><FinInstnId><BICFI>BBBBUS33</BICFI></FinInstnId></DbtrAgt><CdtTrfTxInf><PmtId><InstrId>ABC/120928/CCT001/01</InstrId><EndToEndId>ABC/4562/4</EndToEndId></PmtId><Amt><InstdAmt Ccy="JPY">100</InstdAmt></Amt><ChrgBr>SHAR</ChrgBr><CdtrAgt><FinInstnId><BICFI>AAAAGB2L</BICFI></FinInstnId></CdtrAgt><Cdtr><Nm>DEF Electronics</Nm><PstlAdr><AdrLine>Corn Exchange 5th Floor</AdrLine><AdrLine>Mark Lane 55</AdrLine><AdrLine>EC3R7NE London</AdrLine><AdrLine>GB</AdrLine></PstlAdr></Cdtr><CdtrAcct><Id><Othr><Id>23683707994125</Id></Othr></Id></CdtrAcct><Purp><Cd>GDDS</Cd></Purp><RmtInf><Strd><RfrdDocInf><Tp><CdOrPrtry><Cd>CINV</Cd></CdOrPrtry></Tp><Nb>4562</Nb><RltdDt>2012-09-08</RltdDt></RfrdDocInf></Strd></RmtInf></CdtTrfTxInf></PmtInf></CstmrCdtTrfInitn></pain001>
<pain001><CstmrCdtTrfInitn><GrpHdr><MsgId>ABC/120928/CCT001</MsgId><CreDtTm>2012-09-28T14:07:00</CreDtTm><NbOfTxs>100000</NbOfTxs><CtrlSum>11500000</CtrlSum> <InitgPty><Nm>ABC Corporation</Nm><PstlAdr><StrtNm>Times Square</StrtNm><BldgNb>7</BldgNb><PstCd>NY 10036</PstCd><TwnNm>New York</TwnNm><Ctry>US</Ctry></PstlAdr></InitgPty></GrpHdr><PmtInf><PmtInfId>CARCORP/086</PmtInfId><PmtMtd>TRF</PmtMtd><BtchBookg>false</BtchBookg><ReqdExctnDt>2012-09-29</ReqdExctnDt><Dbtr><Nm>CARCORP INC</Nm><PstlAdr><StrtNm>Times Square</StrtNm><BldgNb>7</BldgNb><PstCd>NY 10036</PstCd><TwnNm>New York</TwnNm><Ctry>US</Ctry></PstlAdr></Dbtr><DbtrAcct><Id><Othr><Id>00125574999</Id></Othr></Id></DbtrAcct><DbtrAgt><FinInstnId><BICFI>BBBBUS33</BICFI></FinInstnId></DbtrAgt><CdtTrfTxInf><PmtId><InstrId>ABC/120928/CCT001/01</InstrId><EndToEndId>ABC/4562/4</EndToEndId></PmtId><Amt><InstdAmt Ccy="JPY">100</InstdAmt></Amt><ChrgBr>SHAR</ChrgBr><CdtrAgt><FinInstnId><BICFI>AAAAGB2L</BICFI></FinInstnId></CdtrAgt><Cdtr><Nm>DEF Electronics</Nm><PstlAdr><AdrLine>Corn Exchange 5th Floor</AdrLine><AdrLine>Mark Lane 55</AdrLine><AdrLine>EC3R7NE London</AdrLine><AdrLine>GB</AdrLine></PstlAdr></Cdtr><CdtrAcct><Id><Othr><Id>23683707994125</Id></Othr></Id></CdtrAcct><Purp><Cd>GDDS</Cd></Purp><RmtInf><Strd><RfrdDocInf><Tp><CdOrPrtry><Cd>CINV</Cd></CdOrPrtry></Tp><Nb>4562</Nb><RltdDt>2012-09-08</RltdDt></RfrdDocInf></Strd></RmtInf></CdtTrfTxInf></PmtInf></CstmrCdtTrfInitn></pain001>
</code></pre>
<p>我用Xpath作为逻辑的列表理解来解析值</p>
^{pr2}$
<p>下面是我的错误</p>
<pre><code>Traceback (most recent call last):
File "C:\Python27\parsefunc.py", line 10, in <module>
tree = ET.parse('pain1.xml')
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1182, in parse
tree.parse(source, parser)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 656, in parse
parser.feed(data)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1642, in feed
self._raiseerror(v)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1506, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: junk after document element: line 2, column 0
</code></pre>
<p>我需要解析后的输出如下所示</p>
<pre><code>ABC/120928/CCT001,2012-09-28T14:07:00,ABC Corporation,2012-09-28T14:07:00,100000,11500000,Times Square,7,NY 10036,New York,US,CARCORP/086,TRF,false,2012-09-29,CARCORP INC,Times Square,7,NY 10036,New York,US,00125574999,BBBBUS33,ABC/120928/CCT001/01,ABC/4562/1,100,100,SHAR,AAAAGB2L,DEF Electronics,Corn Exchange 5th Floor,Mark Lane 55,EC3R7NE London,GB,CINV,4562,2012-09-08
ABC/120928/CCT001,2012-09-28T14:07:00,ABC Corporation,2012-09-28T14:07:00,100000,11500000,Times Square,7,NY 10036,New York,US,CARCORP/086,TRF,false,2012-09-29,CARCORP INC,Times Square,7,NY 10036,New York,US,00125574999,BBBBUS33,ABC/120928/CCT001/01,ABC/4562/1,100,100,SHAR,AAAAGB2L,DEF Electronics,Corn Exchange 5th Floor,Mark Lane 55,EC3R7NE London,GB,CINV,4562,2012-09-08
</code></pre>
<p>有没有比使用元素树的列表理解更好的方法呢?或者我如何以上述方式解析并获得输出来解析同一文件中的其他xml</p>
<p><strong>更新</strong></p>
<p>我能够用Parfait建议的一种新方法在一行代码中进行解析和生成,但是当我试图为多个xml实现下面的解决方案时,仍然遇到了相同的错误</p>
<p>导入系统
进口lxml.etree作为ET</p>
<pre><code>net = []
tree = ET.parse('pain001.xml')
root = tree.getroot()
line= tree.xpath('//text()')
line = map(lambda line: line.strip(), line)
net = filter(bool, line)
#str_list = filter(None, str_list)
#net = root.xpath('//*')
net = ",".join(net)
</code></pre>