如何使用美丽的汤找到所有刮只名单,这是身体的一部分

2024-10-01 02:33:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我很难用“靓汤”将这个维基百科列表与洛杉矶的邻居们一块儿删除。我得到了身体的所有内容,而不仅仅是我想要的邻里名单。我看到了很多关于如何刮表的内容,但是我在如何在这种情况下应用表逻辑方面被绊住了。 这是我一直在使用的代码:

import BeautifulSoup

address = 'Los Angeles, United States'

url = "https://en.wikipedia.org/wiki/List_of_districts_and_neighborhoods_of_Los_Angeles"

source = requests.get(url).text

soup = BeautifulSoup(source,'lxml')

neighborhoodList = []

-- append the data into the list

for row in soup.find_all("div", class_="mw-body")[0].findAll("li"):

   neighborhoodList.append(row.text.replace(', LA',''))

df_neighborhood = pd.DataFrame({"Neighborhood": neighborhoodList})

Tags: ofthetexturlsource内容列表row
1条回答
网友
1楼 · 发布于 2024-10-01 02:33:07

如果查看页面源,则邻域条目位于具有“div col”类的div中,并且链接包含“title”属性

此外,在追加过程中,似乎不需要替换文本

以下代码:

import requests
from bs4 import BeautifulSoup
import pandas as pd

address = 'Los Angeles, United States'
url = "https://en.wikipedia.org/wiki/List_of_districts_and_neighborhoods_of_Los_Angeles"
source = requests.get(url).text
soup = BeautifulSoup(source, 'lxml')
neighborhoodList = []

#   append the data into the list

links = []
for row in soup.find_all("div", class_="div-col"):
    for item in row.select("a"):
        if item.has_attr('title'):
            neighborhoodList.append(item.text)

df_neighborhood = pd.DataFrame({"Neighborhood": neighborhoodList})

print(f'First 10 Rows:')
print(df_neighborhood.head(n=10))
print(f'\nLast 10 Rows:')
print(df_neighborhood.tail(n=10))

结果:

First 10 Rows:
             Neighborhood
0        Angelino Heights
1                  Arleta
2       Arlington Heights
3           Arts District
4         Atwater Village
5           Baldwin Hills
6  Baldwin Hills/Crenshaw
7         Baldwin Village
8           Baldwin Vista
9        Beachwood Canyon

Last 10 Rows:
           Neighborhood
186    Westwood Village
187     Whitley Heights
188  Wholesale District
189          Wilmington
190     Wilshire Center
191       Wilshire Park
192      Windsor Square
193            Winnetka
194      Woodland Hills
195      Yucca Corridor

相关问题 更多 >