使用Beauty Soup（Python）从表中提取特定值

import bs4 styleData=[] pagedata = requests.get("https://gis.vgsi.com/portsmouthnh/Parcel.aspx?pid=38919") cleanpagedata = bs4.BeautifulSoup(pagedata.text, 'html.parser') table=cleanbyAddPD.find('div',{'id':'MainContent_ctl01_panView'}) style=table.findall('tr')[3] style=style.findall('td')[1].text print(style) styleData.append(style)

3条回答

网友

1楼 · 编辑于 2024-10-02 14:15:57

您可以使用CSS选择器：

#MainContent_ctl01_grdCns tr:nth-of-type(4) td:nth-of-type(2)

它将选择"MainContent_ctl01_grdCns"{}，第四个<tr>，第二个<td>

要使用CSS选择器，请使用^{}方法而不是find_all()。或者select_one()而不是find()

import requests
from bs4 import BeautifulSoup


URL = "https://gis.vgsi.com/portsmouthnh/Parcel.aspx?pid=38919"

soup = BeautifulSoup(requests.get(URL).content, "html.parser")
print(
    soup.select_one(
        "#MainContent_ctl01_grdCns tr:nth-of-type(4)  td:nth-of-type(2)"
    ).text
)

输出：

Townhouse End

网友

2楼 · 编辑于 2024-10-02 14:15:57

可能您误用了find_all函数，请尝试以下解决方案：

style=table.find_all('tr')[3]
style=style.find_all('td')[1].text
print(style)

它将为您提供预期的输出

网友

3楼 · 编辑于 2024-10-02 14:15:57

还可以执行以下操作：

import bs4 
import requests
style_data = []
url = "https://gis.vgsi.com/portsmouthnh/Parcel.aspx?pid=38919"

soup = bs4.BeautifulSoup(requests.get(url).content, 'html.parser')
# select the first `td` tag whose text contains the substring `Style:`.
row = soup.select_one('td:-soup-contains("Style:")')
if row:
    # if that row was found get its sibling which should be that vlue you want
    home_style_tag = row.next_sibling
    style_data.append(home_style_tag.text)

几张便条

这使用CSS选择器而不是find方法。有关更多详细信息，请参见SoupSieve docs
select_one依赖于这样一个事实，即表总是以某种方式排序，如果不是这样，则使用select并遍历结果以找到其文本正好是'Style:'的bs4.Tag，然后获取其下一个同级

使用select：

rows = soup.select('td:-soup-contains("Style:")')
row = [r for r in rows if r.text == 'Style:']
home_style_text = row.text

相关问题更多 >

编程相关推荐

热门问题

热门文章