如何从wikipedia中获取非表格列表并创建datafram?

2024-10-02 20:39:41 发布

您现在位置:Python中文网/ 问答频道 /正文

en.wikipedia.org/wiki/List_of_neighbourhoods_of_Istanbul

在上面的链接中,有一个伊斯坦布尔社区的联合国表格数据

我想通过这段代码将这些邻域提取到一个数据帧中

import pandas as pd
import requests
from bs4 import BeautifulSoup

wikiurl="https://en.wikipedia.org/wiki/List_of_neighbourhoods_of_Istanbul"
response=requests.get(wikiurl)
soup = BeautifulSoup(response.text, 'html.parser')
tocList=soup.findAll('a',{'class':"new"})

neighborhoods=[]

for item in tocList:
    text = item.get_text()
    neighborhoods.append(text)

    
df = pd.DataFrame(neighborhoods, columns=['Neighborhood'])
print(df)

我得到了这个输出:

    Neighborhood
0   Maden
1   Nizam
2   Anadolu
3   Arnavutköy İmrahor
4   Arnavutköy İslambey
...     ...
705     Seyitnizam
706     Sümer
707     Telsiz
708     Veliefendi
709     Yeşiltepe

710 rows × 1 columns

但有些数据未提取,请检查以下数据并与输出进行比较:

 Adalar
    
        Burgazada
        Heybeliada
        Kınalıada
        Maden
        Nizam

findall()不获取称为链接的邻域,而不是类,即

<ol><li><a href="/wiki/Burgazada" title="Burgazada">Burgazada</a></li>
<li><a href="/wiki/Heybeliada" title="Heybeliada">Heybeliada</a></li>

我可以将代码分成两列,每个“邻居”和它的“地区”


Tags: of数据textorgimportwikiliwikipedia
1条回答
网友
1楼 · 发布于 2024-10-02 20:39:41

您正在尝试从目录中获取此列表吗

请检查这是否解决了您的问题:

import pandas as pd
import requests
from bs4 import BeautifulSoup

wikiurl="https://en.wikipedia.org/wiki/List_of_neighbourhoods_of_Istanbul"
response=requests.get(wikiurl)
soup = BeautifulSoup(response.text, 'html.parser')
tocList=soup.findAll('span',{'class':"toctext"})

districts=[]
blocked_words = ['Neighbourhoods by districts','Further reading', 'External links']
for item in tocList:
    text = item.get_text()
    if text not in blocked_words:
        districts.append(text)

    
df = pd.DataFrame(districts, columns=['districts'])
print(df)

输出:

       districts
0          Adalar
1      Arnavutköy
2        Ataşehir
3         Avcılar
4        Bağcılar
5    Bahçelievler
6        Bakırköy
7      Başakşehir
8      Bayrampaşa
9        Beşiktaş
10         Beykoz
11     Beylikdüzü
12        Beyoğlu
13   Büyükçekmece
14        Çatalca
15       Çekmeköy
16        Esenler
17       Esenyurt
18           Eyüp
19          Fatih
20  Gaziosmanpaşa
21       Güngören
22        Kadıköy
23      Kağıthane
24         Kartal
25   Küçükçekmece
26        Maltepe
27         Pendik
28     Sancaktepe
29        Sarıyer
30        Silivri
31    Sultanbeyli
32     Sultangazi
33           Şile
34          Şişli
35          Tuzla
36       Ümraniye
37        Üsküdar
38    Zeytinburnu

相关问题 更多 >