用beauthoulsoup在python中创建表

2024-09-29 19:36:06 发布

您现在位置:Python中文网/ 问答频道 /正文

这里是python的新手,我有一个关于用漂亮的汤从刮痕中创建桌子的问题。以下是我使用的代码:

import requests
page=requests.get("https://www.opensecrets.org/lobby/lobbyist.php?id=Y0000008510L&year=2018")
from bs4 import BeautifulSoup
soup=BeautifulSoup(page.content, 'lxml')
table=soup.find(‘table’,{‘id’:’lobbyist_summary’})
for row in table:
    cells=row.find_all(‘a’)
    rn=cells[0].get_text()

错误是:

^{pr2}$

打印(表格)如下所示:

[<a href="firmsum.php?id=D000037635&amp;year=2018">Ballard Partners</a>, <a href="clientsum.php?id=F203227&amp;year=2018">Advanced Roofing Inc</a>, <a href="clientsum.php?id=F214670&amp;year=2018">Africell Holding</a>, <a href="clientsum.php?id=D000023883&amp;year=2018">Amazon.com</a>, ...]

我希望(最终)得到一个表,该表将每个感兴趣的元素放在一个单独的列中,这样它看起来像:

[[firmsum,D0000376352018,Ballard Partners],[clientsum,F20322722018,Advanced Roofing Inc],[clientsum,F2146702018,Africell控股],[clientsum,D0000238832018,亚马逊网站]…]


Tags: importidgetpagetablefindrequestsyear
1条回答
网友
1楼 · 发布于 2024-09-29 19:36:06

分配4个空列表:

col1List = list()
col2List = list()
col3List = list()
col4List = list()

首先,让我们得到第4列的值:

^{pr2}$

这样可以得到:

['Ballard Partners', 'Advanced Roofing Inc', 'Africell Holding',....]

{1>现在从第一列中提取值:

hrefVal = trs.find_all('a')

for i in hrefVal:
    hVal = i.get('href')
    col11 = hVal.split('.php?id=', 1)
    col1 = col11[0]
    col1List.append(col1)
    col22 = col11[1].split('&', 1)
    col2 = col22[0]
    col2List.append(col2)
    col33 = col22[1].split('=', 1)
    col3 = col33[1]
    col3List.append(col3)

现在,让我们将所有列表放入一个数据框中,使其看起来整洁:

import pandas as pd

df = pd.DataFrame()
df['Col1'] = col1List
df['Col2'] = col2List
df['Col3'] = col3List
df['Col4'] = col4List

如果我输出前几行,它看起来就像您想要的那样:

Col1        Col2        Col3    Col4
firmsum     D000037635  2018    Ballard Partners
clientsum   F203227     2018    Advanced Roofing Inc
clientsum   F214670     2018    Africell Holding
clientsum   D000023883  2018    Amazon.com
clientsum   D000000192  2018    American Health Care Assn
clientsum   D000021839  2018    American Road & Transport Builders Assn

相关问题 更多 >

    热门问题