如何在使用Beautiful Soup和请求按下按钮后获得HTML更改

2024-09-30 01:38:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我想得到这个网站的HTML https://www.forebet.com/en/football-predictions后,按下按钮更多[+]足够的时间来加载所有游戏。每次页面底部的按钮More[+]时,HTML都会发生变化并显示更多的足球比赛。如何在加载所有足球比赛的情况下获取页面请求

from bs4 import BeautifulSoup
import requests

leagues = {"EPL","UCL","Es1","De1","Fr1","Pt1","It1","UEL"}

class ForeBet:

#gets all games from the leagues on leagues returning the games on a string list
#game format is League|Date|Hour|Home Team|Away Team|Prob Home|Prob Tie| Prob Away
def get_games_and_probs(self):

    response=requests.get('https://www.forebet.com/en/football-prediction')
    soup = BeautifulSoup(response.text, 'html.parser')
    results=list()

    games = soup.findAll(class_='rcnt tr_0')+soup.findAll(class_='rcnt tr_1')

    for game in games:
        if(leagues.__contains__(game.find(class_='shortTag').text.strip())):
            game=game.find(class_='shortTag').text+"|"+\
                game.find(class_='date_bah').text.split(" ")[0]+"|"+ \
                game.find(class_='date_bah').text.split(" ")[1]+"|"+ \
                game.find(class_='homeTeam').text+"|"+\
                game.find(class_='awayTeam').text+"|"+\
                game.find(class_='fprc').findNext().text+"|"+\
                game.find(class_='fprc').findNext().findNext().text+"|"+\
                game.find(class_='fprc').findNext().findNext().findNext().text
            print(game)
            results.append(game)

    return results

Tags: texthttpsgamehtmlwwwfindresultsgames
1条回答
网友
1楼 · 发布于 2024-09-30 01:38:05

如前所述,请求和beautfulsoup用于解析数据,而不是与站点交互。要做到这一点,你需要硒

您的另一个选项是查看是否可以直接获取数据,并查看是否有参数可以像单击“获取更多”按钮一样发出另一个请求。这对你有用吗

import pandas as pd
import requests

results = pd.DataFrame()
i=0
while True:
    print(i)
    url = 'https://m.forebet.com/scripts/getrs.php'
    payload = {
    'ln': 'en',
    'tp': '1x2',
    'in': '%s' %(i+11),
    'ord': '0'}
    
    jsonData = requests.get(url, params=payload).json()
    results = results.append(pd.DataFrame(jsonData[0]), sort=False).reset_index(drop=True)

    if max(results['id'].value_counts()) <=1:
        i+=1
    else:
        results = results.drop_duplicates()
        break

输出:

print(results)
          id  pr_under  ...    country         full_name
0    1473708        31  ...    England   Isthmian League
1    1473713        35  ...    England   Isthmian League
2    1473745        28  ...    England   Isthmian League
3    1473710        35  ...    England   Isthmian League
4    1473033        28  ...    England  Premier League 2
..       ...       ...  ...        ...               ...
515  1419208        47  ...  Argentina  Torneo Federal A
516  1419156        57  ...  Argentina  Torneo Federal A
517  1450589        50  ...    Armenia    Premier League
518  1450590        35  ...    Armenia    Premier League
519  1450591        52  ...    Armenia    Premier League

[518 rows x 73 columns]

相关问题 更多 >

    热门问题