网络垃圾Python BeautifulSoup

2024-10-02 02:26:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在编写一个有趣的python代码,它将在Eurostar中找到最低的火车票价格。我对汤很陌生,所以对它了解不多。出于某种原因,该代码在理论上应该从“ul”表中检索信息时,并没有从中检索信息

代码如下:

input_parser = InputParser()
input_parser.inputDestinations("London","Paris")
input_parser.adults=2
input_parser.inputDates("2021-10-08","2021-10-10")

URL = input_parser.createURL()
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

results = soup.find_all("ul", {"class": "train-table"})

类输入解析器基本上根据特定数据返回URL:

class InputParser():

    def __init__(self):
        self.mapOfDestinations = {"London": "7015400", "Paris": "8727100", "Brussels": "8814001"}
        self.destinations = []
        self.adults = 0
        self.departureDate = ""
        self.arrivalDate = ""

    def inputDestinations(self, departureDestination, arrivalDestination):
        self.destinations.append(self.mapOfDestinations[departureDestination])
        self.destinations.append(self.mapOfDestinations[arrivalDestination])

    def inputDates(self, departureDate, arrivalDate):
        self.departureDate = departureDate
        self.arrivalDate = arrivalDate

    def inputAdults(self, numberOfAdults):
        self.adults = numberOfAdults

    def createURL(self):
        default_URL = "https://booking.eurostar.com/uk-en/train-search?origin={0}&destination={1}&adult={2}&outbound-date={3}&inbound-date={4}". \
            format(self.destinations[0], self.destinations[1], self.adults, self.departureDate, self.arrivalDate)
        return default_URL

我的代码应该返回链接到“train table”的“ul”表,但它不返回任何值。知道我做错了什么吗

如果您想查看源代码,代码将提供以下URL:https://booking.eurostar.com/uk-en/train-search?origin=7015400&destination=8727100&adult=1&outbound-date=2021-10-08&inbound-date=2021-10-10

非常感谢你


Tags: 代码self信息parserurlinputdatedef
1条回答
网友
1楼 · 发布于 2024-10-02 02:26:44

您看到的数据是从外部URL加载的,因此beautifulsoup看不到它。但是您可以使用requests模块来模拟此查询:

import json
import requests

origin = "7015400"
destination = "8727100"

api_url = f"https://api.prod.eurostar.com/bpa/train-search/uk-en/{origin}/{destination}"
params = {
    "outbound-date": "2021-10-08",
    "inbound-date": "2021-10-10",
    "adult": "1",
    "booking-type": "standard",
}

headers = {"X-apikey": "0aa3d4b7e805493c8e310cfb871c4344"}

data = requests.get(api_url, params=params, headers=headers).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for j in data["outbound"]["journey"]:
    for c in j["class"]:
        if "price" in c:
            print(
                "{:<10} {:<10} {:<10} {:<10}".format(
                    j["departureTime"],
                    j["arrivalTime"],
                    c["remaining"],
                    c["price"]["adult"],
                )
            )

印刷品:

07:01      10:17      150        134.5     
07:01      10:17      20         149.5     
07:01      10:17      47         245       
08:01      11:17      70         134.5     
08:01      11:17      2          179.5     
08:01      11:17      30         245       
10:24      13:47      27         134.5     
10:24      13:47      10         179.5     
10:24      13:47      31         245       
12:24      15:47      70         134.5     
12:24      15:47      50         219.5     
12:24      15:47      13         245       
16:31      19:47      7          134.5     
16:31      19:47      41         219.5     
16:31      19:47      31         245       
19:01      22:17      45         134.5     
19:01      22:17      8          149.5     
19:01      22:17      42         245       
20:01      23:17      35         74.5      
20:01      23:17      19         119.5     
20:01      23:17      51         245       

相关问题 更多 >

    热门问题