<p>变量<code>data</code>正在连接有问题的字符串(<a href="https://pastebin.com/qdVNWQHJ" rel="nofollow noreferrer">link</a>-太长,无法粘贴到此处):</p>
<pre><code>import csv
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'lxml')
cols = ['nameOfIssuer', 'titleOfClass', 'cusip', 'value', 'sshPrnamt', 'sshPrnamtType', 'putCall', 'investmentDiscretion', 'otherManager', 'Sole', 'Shared', 'None']
data = []
for info_table in soup.find_all(['ns1:infotable', 'infotable']):
row = []
for col in cols:
d = info_table.find([col.lower(), 'ns1:' + col.lower()])
row.append(d.text.strip() if d else 'NaN')
data.append(row)
headers = ['NameofIssuer', 'TitleofClass', 'cusip', 'value', 'shrsPrnamt', 'shrsPrnamtType', 'putcall', 'investmentDescrestion', 'othermanager', 'vaSole', 'vaShared', 'vaNone']
with open('data.csv', 'w', newline='') as csvfile:
csvwriter = csv.writer(csvfile, delimiter=',',
quotechar='"', quoting=csv.QUOTE_MINIMAL)
csvwriter.writerow(headers)
csvwriter.writerows(data)
</code></pre>
<p>写入<code>data.csv</code>:</p>
<pre><code>NameofIssuer,TitleofClass,cusip,value,shrsPrnamt,shrsPrnamtType,putcall,investmentDescrestion,othermanager,vaSole,vaShared,vaNone
COMPANYFOUR,COM,00004,67,36100,SH,Call,DFND,"01, 02",36100,0,0
COMPANYFIVE,SPONSORED ADS A,00005,2695,339367,SH,NaN,DFND,"01, 02",339367,0,0
COMPANYONE,SHS CLASS -A -,00000,21944,3060500,SH,NaN,SOLE,NaN,3060500,0,0
COMPANYTWO,COM,00001,67822,1898717,SH,NaN,SOLE,NaN,1898717,0,0
COMPANYTHREE,CL B NEW,00002,10462145,52078974,SH,NaN,SOLE,NaN,52078974,0,0
</code></pre>
<p>在LibreOffice中,它看起来:</p>
<p><a href="https://i.stack.imgur.com/udP6B.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/udP6B.png" alt="enter image description here"/></a></p>