美化输出到lis的组元素

2024-06-26 00:07:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个使用BeautifulSoup的输出。你知道吗

  1. 我需要转换'type''bs4的输出。元素.标记“到一个列表,并将该列表导出到一个数据帧列,名为column\u a

  2. 我希望输出停止在第14个元素(最后三个h2没有用)

我的代码:

import requests
from bs4 import BeautifulSoup


url = 'https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
attraction_place=soup.find_all('h2', class_="sitename")    

for attraction in attraction_place:
    print(attraction.text)
    type(attraction)

输出:

1  Vigeland Sculpture Park
2  Akershus Fortress
3  Viking Ship Museum
4  The National Museum
5  Munch Museum
6  Royal Palace
7  The Museum of Cultural History
8  Fram Museum
9  Holmenkollen Ski Jump and Museum
10  Oslo Cathedral
11  City Hall (Rådhuset)
12  Aker Brygge
13  Natural History Museum & Botanical Gardens
14  Oslo Opera House and Annual Music Festivals
Where to Stay in Oslo for Sightseeing
Tips and Tours: How to Make the Most of Your Visit to Oslo
More Related Articles on PlanetWare.com

我希望有这样的清单:

attraction=[Vigeland Sculpture Park, Akershus Fortress, ......]

事先非常感谢。你知道吗


Tags: andtoimporturl元素列表gettype
3条回答
new = []
count = 1
for attraction in attraction_place:
    while count < 15:
        text = attraction.text
        new.append(text)
        count += 1

你可以用切片。你知道吗

for attraction in attraction_place[:14]:
    print(attraction.text)
    type(attraction)

一个很简单的方法是获取照片的alt属性。这样可以得到干净的文本输出,并且只有14个文本,而不需要切片/索引。你知道吗

from bs4 import BeautifulSoup
import requests

r = requests.get('https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm')
soup = bs(r.content, 'lxml')
attractions = [item['alt'] for item in soup.select('.photo [alt]')]
print(attractions)

相关问题 更多 >