如何使用python从sofascore获取足球成绩

2024-10-01 07:24:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在python3.8上做这个项目。我必须将数据下载到熊猫数据框中,并最终写入2018和2019年所有英超球队的数据库(SQL或Access)。我正试着用美容素。 我有一个代码soccerbase.com网站但它不起作用索菲亚斯科网站@到目前为止,压迫者一直在帮助编写代码。 谁能帮帮我吗?在

import json

import pandas as pd
import requests
from bs4 import BeautifulSoup as bs

url = "https://www.sofascore.com/football///json"
r = requests.get(url)
soup = bs(r.content, 'lxml')
json_object = json.loads(r.content)

json_object['sportItem']['tournaments'][0]['events'][0]['homeTeam']['name']
# 'Sheffield United'

json_object['sportItem']['tournaments'][0]['events'][0]['awayTeam']['name']  # 'Manchester United'

json_object['sportItem']['tournaments'][0]['events'][0]['homeScore']['current']
# 3

json_object['sportItem']['tournaments'][0]['events'][0]['awayScore']['current']

print(json_object)

我如何循环这些代码来获得整个团队? 我的目标是获得每支球队的数据,行为[“Event date”,“Competition”,“Home team”,“Home Score”,“Away team”,“Away Score”,“Score”] e、 g.2019年10月31日英超切尔西1曼联2 1-2

我是一个萨特人,我怎么能得到它?在


Tags: 数据代码importcomjsonbsobject网站
3条回答
import pandas as pd
import requests
from bs4 import BeautifulSoup as bs

url = 'https://www.soccerbase.com/teams/home.sd'
r = requests.get(url)
soup = bs(r.content, 'html.parser')
teams = soup.find('div', {'class': 'headlineBlock'}, text='Team').next_sibling.find_all('li')

teams_dict = {}
for team in teams:
    link = 'https://www.soccerbase.com' + team.find('a')['href']
    team = team.text

    teams_dict[team] = link

consolidated = []
for k, v in teams_dict.items():
    print('Acquiring %s data...' % k)

    headers = ['Team', 'Competition', 'Home Team', 'Home Score', 'Away Team', 'Away Score', 'Date Keep']
    r = requests.get('%s&teamTabs=results' % v)
    soup = bs(r.content, 'html.parser')

    h_scores = [int(i.text) for i in soup.select('.score a em:first-child')]
    a_scores = [int(i.text) for i in soup.select('.score a em + em')]

    limit = len(a_scores)
    team = [k for i in soup.select('.tournament', limit=limit)]
    comps = [i.text for i in soup.select('.tournament a', limit=limit)]
    dates = [i.text for i in soup.select('.dateTime .hide', limit=limit)]
    h_teams = [i.text for i in soup.select('.homeTeam a', limit=limit)]
    a_teams = [i.text for i in soup.select('.awayTeam a', limit=limit)]

    df = pd.DataFrame(list(zip(team, comps, h_teams, h_scores, a_teams, a_scores, dates)),
                      columns=headers)
    consolidated.append(df)

pd.concat(consolidated)(r'#your file location address sep=',', encoding='utf-8-sig', index=False)

这个代码就行了。 虽然它不能捕获网站的所有数据库,但它是一个强大的刮刀

import simplejson as json
import pandas as pd
import requests
from bs4 import BeautifulSoup as bs

url = "https://www.sofascore.com/football///json"
r = requests.get(url)
soup = bs(r.content, 'lxml')
json_object = json.loads(r.content)

headers = ['Tournament', 'Home Team', 'Home Score', 'Away Team', 'Away Score', 'Status', 'Start Date']
consolidated = []
for tournament in json_object['sportItem']['tournaments']:
    rows = []
    for event in tournament["events"]:
        row = []
        row.append(tournament["tournament"]["name"])
        row.append(event["homeTeam"]["name"])
        if "current" in event["homeScore"].keys():
            row.append(event["homeScore"]["current"])
        else:
            row.append(-1)
        row.append(event["awayTeam"]["name"])
        if "current" in event["awayScore"].keys():
            row.append(event["awayScore"]["current"])
        else:
            row.append(-1)
        row.append(event["status"]["type"])
        row.append(event["formatedStartDate"])
        rows.append(row)
    df = pd.DataFrame(rows, columns=headers)
    consolidated.append(df)

pd.concat(consolidated).to_csv(r'Path.csv', sep=',', encoding='utf-8-sig',
                               index=False)

图片来源:Praful Surve@praful-surve

从这里开始:

https://www.sofascore.com/football///json

它以json格式给出分数。主页面不会抓取这些数据。也就是说它不在主页面源代码上。这应该能帮助你开始。在

可以按如下方式加载:

^{pr2}$

下面是一个如何从json中提取数据的示例。最终,您必须使用循环在您看到[0]的位置迭代数据,但这将使您开始了解如何获取数据:

json_object = json.loads(r.content)

json_object['sportItem']['tournaments'][0]['events'][0]['homeTeam']['name']                                                                                                                      
#'Sheffield United'  

json_object['sportItem']['tournaments'][0]['events'][0]['awayTeam']['name']                                                                                                                      #'Manchester United'

json_object['sportItem']['tournaments'][0]['events'][0]['homeScore']['current']  
#3

json_object['sportItem']['tournaments'][0]['events'][0]['awayScore']['current']  
#3

我希望这有帮助

更新:

import pandas as pd
import requests
from bs4 import BeautifulSoup as bs

url = 'https://www.soccerbase.com/teams/home.sd'
r = requests.get(url)
soup = bs(r.content, 'html.parser')
teams = soup.find('div', {'class': 'headlineBlock'}, text='Team').next_sibling.find_all('li')

teams_dict = {}
for team in teams:
    link = 'https://www.soccerbase.com' + team.find('a')['href']
    team = team.text

    teams_dict[team] = link


team = []
comps = []
dates = []
h_teams = []
a_teams = []
h_scores = []
a_scores = []

consolidated = []
for k, v in teams_dict.items():
    print('Acquiring %s data...' % k)

    headers = ['Team', 'Competition', 'Home Team', 'Home Score', 'Away Team', 'Away Score', 'Date Keep']
    r = requests.get('%s&teamTabs=results' % v)
    soup = bs(r.content, 'html.parser')

    h_scores.extend([int(i.text) for i in soup.select('.score a em:first-child')])
    limit_scores = [int(i.text) for i in soup.select('.score a em + em')]
    a_scores.extend([int(i.text) for i in soup.select('.score a em + em')])

    limit = len(limit_scores)
    team.extend([k for i in soup.select('.tournament', limit=limit)])
    comps.extend([i.text for i in soup.select('.tournament a', limit=limit)])
    dates.extend([i.text for i in soup.select('.dateTime .hide', limit=limit)])
    h_teams.extend([i.text for i in soup.select('.homeTeam a', limit=limit)])
    a_teams.extend([i.text for i in soup.select('.awayTeam a', limit=limit)])



df = pd.DataFrame(list(zip(team, comps, h_teams, h_scores, a_teams, a_scores, dates)),
                      columns=headers)

您可以使用以下工具进行搜索和打印:

df[df['Team'] == 'Wolves']
print(df.to_string())

并获取酷数据:

df.groupby('Team').agg({'Home Score': 'mean', 'Away Score': 'mean'})                                                                                                

                Home Score  Away Score
Team                                  
Arsenal           2.105263    1.368421
Aston Villa       1.687500    1.625000
Bournemouth       1.266667    1.066667
Brighton          1.533333    1.200000
Burnley           1.642857    1.357143
Chelsea           1.900000    1.850000
Crystal Palace    1.142857    0.928571
Everton           1.375000    1.312500
Leicester         1.312500    1.750000
Liverpool         1.857143    1.761905
Man City          2.050000    1.600000
Man Utd           1.421053    0.894737
Newcastle         1.571429    0.785714
Norwich           1.642857    1.357143
Sheff Utd         1.066667    1.066667
Southampton       1.125000    2.187500
Tottenham         1.888889    1.555556
Watford           1.500000    1.125000
West Ham          1.533333    1.466667
Wolves            1.280000    1.440000

或者

df[df['Away Team'] == 'Leicester'].agg({'Home Score': 'mean', 'Away Score': 'mean'})                                                                                

Home Score    0.722222
Away Score    2.388889
dtype: float64

太棒了。DF.T很好,还有一个数据框到sql()如果你走那条路。我希望我的改变能有所帮助,而且我总是很乐意帮助更多人

相关问题 更多 >