与美丽的城市群在一起的公共汽车站

2024-10-01 19:33:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从web上搜索给定线路的公交站点名称,下面是212https://www.m2.rozkladzik.pl/warszawa/rozklad_jazdy.html?l=212线路的示例页面。我想有两个列表作为输出,一个在一个方向上有公交站点名称,另一个在另一个方向上有公交站点名称。(在网页上可以清楚地看到)。我设法把所有的名字都列在一张名单上

import requests
from bs4 import BeautifulSoup


def download_bus_schedule(bus_number):
    URL = "http://www.m2.rozkladzik.pl/warszawa/rozklad_jazdy.html?l=" + bus_number
    r = requests.get(URL)
    soup = BeautifulSoup(r.content,
                         'html5lib')
    print(soup.prettify())
    all_bus_stops = []
    table = soup.find_all('a')
    for element in table:
        if element.get_text() in all_bus_stops:
            continue
        else:
            all_bus_stops.append(element.get_text())
    return all_bus_stops

print(download_bus_schedule('212'))

我想解决办法是把汤分成两份


Tags: 名称getwwwelementall线路plsoup
3条回答

您可以使用bs4.element.Tag.findAll方法:

import requests
from bs4 import BeautifulSoup


def download_bus_schedule(bus_number):
    all_bus_stops = []
    URL = "http://www.m2.rozkladzik.pl/warszawa/rozklad_jazdy.html?l=" + bus_number
    r = requests.get(URL)
    soup = BeautifulSoup(r.content, 'html.parser')
    for s in soup.select(".holo-list"):
        bus_stops = []
        for f in s.findAll("li"):
            if f.text not in bus_stops:
                bus_stops.append(f.text)
        all_bus_stops.append(bus_stops)
    return all_bus_stops

print(download_bus_schedule('212'))

输出:

[['Pl.Hallera', 'Pl.Hallera', 'Darwina', 'Namysłowska', 'Rondo Żaba', 'Rogowska', 'Kołowa', 'Dks Targówek', 'Metro Targówek Mieszkaniowy', 'Myszkowska', 'Handlowa', 'Metro Trocka', 'Bieżuńska', 'Jórskiego', 'Łokietka', 'Samarytanka', 'Rolanda', 'Żuromińska', 'Targówek-Ratusz', 'Św.Wincentego', 'Malborska', 'Ch Targówek'], 
 ['Ch Targówek', 'Ch Targówek', 'Malborska', 'Św.Wincentego', 'Targówek-Ratusz', 'Żuromińska', 'Gilarska', 'Rolanda', 'Samarytanka', 'Łokietka', 'Jórskiego', 'Bieżuńska', 'Metro Trocka', 'Metro Trocka', 'Metro Trocka', 'Handlowa', 'Myszkowska', 'Metro Targówek Mieszkaniowy', 'Dks Targówek', 'Kołowa', 'Rogowska', 'Rondo Żaba', '11 Listopada', 'Bródnowska', 'Szymanowskiego', 'Pl.Hallera', 'Pl.Hallera']]

我可能误解了,因为我不懂波兰语,但看看这是否有帮助

from bs4 import BeautifulSoup
import requests

url = 'https://www.m2.rozkladzik.pl/warszawa/rozklad_jazdy.html?l=212'

resp = requests.get(url)
soup = BeautifulSoup(resp.content, "html.parser")

d = {}
for h2 in soup.select('h2.holo-divider'):
    d[h2.text] = []
    ul = h2.next_sibling
    for li in ul.select('li'):
        if li.a.text not in d[h2.text]:
            d[h2.text].append(li.a.text)

from pprint import pprint

pprint(d)
import requests
from bs4 import BeautifulSoup


def download_bus_schedule(bus_number):
    URL = "http://www.m2.rozkladzik.pl/warszawa/rozklad_jazdy.html?l=" + bus_number
    r = requests.get(URL)
    soup = BeautifulSoup(r.content,
                         'html5lib')

    bus_stops_1 = []
    bus_stops_2 = []

    directions = soup.find_all("ul", {"class":"holo-list"})
    
    for stop in directions[0].find_all("a"):
        if stop not in bus_stops_1:
            bus_stops_1.append(stop.text.strip())

    for stop in directions[1].find_all("a"):
        if stop not in bus_stops_2:
            bus_stops_2.append(stop.text.strip())
    
    all_bus_stops = (bus_stops_1, bus_stops_2)

    return all_bus_stops

print(download_bus_schedule('212')[0])
print(download_bus_schedule('212')[1])

相关问题 更多 >

    热门问题