<p>即使这是XML文件,也可以使用BeautifulSoup或<code>text</code>属性的CSS选择器:</p>
<pre><code>data = '''<?xml version="1.0"?><Transfer error="none" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="BulkExtract.xsd"><TransferInfo><FileSequenceNumber>1</FileSequenceNumber><RecordCount>714100</RecordCount><ExtractTime>2019-06-19T12:22:15</ExtractTime></TransferInfo>
<ABR recordLastUpdatedDate="20180216" replaced="N"><ABN status="ACT" ABNStatusFromDate="19991101">11000000948</ABN><EntityType><EntityTypeInd>PUB</EntityTypeInd><EntityTypeText>Australian Public Company</EntityTypeText></EntityType><MainEntity><NonIndividualName type="MN"><NonIndividualNameText>QBE INSURANCE (INTERNATIONAL) LTD</NonIndividualNameText></NonIndividualName><BusinessAddress><AddressDetails><State>NSW</State><Postcode>2000</Postcode></AddressDetails></BusinessAddress></MainEntity><ASICNumber ASICNumberType="undetermined">000000948</ASICNumber><GST status="ACT" GSTStatusFromDate="20000701" /><OtherEntity><NonIndividualName type="TRD"><NonIndividualNameText>QBE INSURANCE (INTERNATIONAL) LIMITED</NonIndividualNameText></NonIndividualName></OtherEntity></ABR>
<ABR recordLastUpdatedDate="20190531" replaced="N"><ABN status="CAN" ABNStatusFromDate="20190501">11000002568</ABN><EntityType><EntityTypeInd>PRV</EntityTypeInd><EntityTypeText>Australian Private Company</EntityTypeText></EntityType><MainEntity><NonIndividualName type="MN"><NonIndividualNameText>TOOHEYS PTY LIMITED</NonIndividualNameText></NonIndividualName><BusinessAddress><AddressDetails><State>NSW</State><Postcode>2141</Postcode></AddressDetails></BusinessAddress></MainEntity><ASICNumber ASICNumberType="undetermined">000002568</ASICNumber></ABR>
</Transfer>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'xml')
z = zip(soup.select('ABR[recordLastUpdatedDate]'),
soup.select('ABR[replaced]'),
soup.select('ABN[status]'),
soup.select('ABN[ABNStatusFromDate]'),
soup.select('ABN'))
for (c1, c2, c3, c4, c5) in z:
print(c1['recordLastUpdatedDate'], c2['replaced'], c3['status'], c4['ABNStatusFromDate'], c5.text.strip())
</code></pre>
<p>印刷品:</p>
<pre><code>20180216 N ACT 19991101 11000000948
20190531 N CAN 20190501 11000002568
</code></pre>