如何排列不同结构的html句子

<nonDerivativeTable> <nonDerivativeHolding> #First Holding <securityTitle> <value>Stock</value> </securityTitle> </nonDerivativeHolding> <nonDerivativeHolding> #Second Holding <securityTitle> <footnoteId id="F1"/> </securityTitle> </nonDerivativeHolding> <nonDerivativeHolding> #Third Holding <securityTitle> <value>Option</value> <footnoteId id="F2"/> <footnoteId id="F3"/> </securityTitle> </nonDerivativeHolding> </nonDerivativeTable>

import csv from bs4 import BeautifulSoup with open('output.csv', 'w', newline='') as outfile: writer = csv.writer(outfile, ) soup = BeautifulSoup(doc, 'htmparser') #Let's say doc has the html. try: securityTitles = soup.select('securityTitle > value').text except: securitiyTitles = '' try: securityTitleFootnotes = '; 'join(soup.select('securityTitle > footnoteid').get('id') except: securityTitleFootnotes = '' for securityTitle, securityTitleFootnote in zip(securitiyTitles, securityTitleFootnotes): writer.writerow([securityTitle, securityTitleFootnote])

1条回答

网友

1楼 · 发布于 2024-10-04 11:34:14

您可以找到每个nonDerivativeHolding的内容，然后为每个nonDerivativeHolding应用处理程序的自定义列表：

from bs4 import BeautifulSoup as soup
c = [i.securitytitle.contents for i in soup(s, 'html.parser').find_all('nonderivativeholding')]
h = [('value', lambda x:x.text), ('footnoteid', lambda x:x['id'])]
results = [[i for i in b if i != '\n'] for b in c]
r = [{a:(lambda x:'' if not x else x[0] if len(x) == 1 else x)([b(j) for j in i if j.name == a]) for a, b in h} for i in results]

输出：

[{'value': 'Stock', 'footnoteid': ''}, {'value': '', 'footnoteid': 'F1'}, {'value': 'Option', 'footnoteid': ['F2', 'F3']}]

相关问题更多 >

编程相关推荐

热门问题

热门文章