<p>从这里开始:</p>
<pre><code>https://www.sofascore.com/football///json
</code></pre>
<p>它以json格式给出分数。主页面不会抓取这些数据。也就是说它不在主页面源代码上。这应该能帮助你开始。在</p>
<p>可以按如下方式加载:</p>
^{pr2}$
<p>下面是一个如何从json中提取数据的示例。最终,您必须使用循环在您看到[0]的位置迭代数据,但这将使您开始了解如何获取数据:</p>
<pre><code>json_object = json.loads(r.content)
json_object['sportItem']['tournaments'][0]['events'][0]['homeTeam']['name']
#'Sheffield United'
json_object['sportItem']['tournaments'][0]['events'][0]['awayTeam']['name'] #'Manchester United'
json_object['sportItem']['tournaments'][0]['events'][0]['homeScore']['current']
#3
json_object['sportItem']['tournaments'][0]['events'][0]['awayScore']['current']
#3
</code></pre>
<p>我希望这有帮助</p>
<p>更新:</p>
<pre><code>import pandas as pd
import requests
from bs4 import BeautifulSoup as bs
url = 'https://www.soccerbase.com/teams/home.sd'
r = requests.get(url)
soup = bs(r.content, 'html.parser')
teams = soup.find('div', {'class': 'headlineBlock'}, text='Team').next_sibling.find_all('li')
teams_dict = {}
for team in teams:
link = 'https://www.soccerbase.com' + team.find('a')['href']
team = team.text
teams_dict[team] = link
team = []
comps = []
dates = []
h_teams = []
a_teams = []
h_scores = []
a_scores = []
consolidated = []
for k, v in teams_dict.items():
print('Acquiring %s data...' % k)
headers = ['Team', 'Competition', 'Home Team', 'Home Score', 'Away Team', 'Away Score', 'Date Keep']
r = requests.get('%s&teamTabs=results' % v)
soup = bs(r.content, 'html.parser')
h_scores.extend([int(i.text) for i in soup.select('.score a em:first-child')])
limit_scores = [int(i.text) for i in soup.select('.score a em + em')]
a_scores.extend([int(i.text) for i in soup.select('.score a em + em')])
limit = len(limit_scores)
team.extend([k for i in soup.select('.tournament', limit=limit)])
comps.extend([i.text for i in soup.select('.tournament a', limit=limit)])
dates.extend([i.text for i in soup.select('.dateTime .hide', limit=limit)])
h_teams.extend([i.text for i in soup.select('.homeTeam a', limit=limit)])
a_teams.extend([i.text for i in soup.select('.awayTeam a', limit=limit)])
df = pd.DataFrame(list(zip(team, comps, h_teams, h_scores, a_teams, a_scores, dates)),
columns=headers)
</code></pre>
<p>您可以使用以下工具进行搜索和打印:</p>
<pre><code>df[df['Team'] == 'Wolves']
print(df.to_string())
</code></pre>
<p>并获取酷数据:</p>
<pre><code>df.groupby('Team').agg({'Home Score': 'mean', 'Away Score': 'mean'})
Home Score Away Score
Team
Arsenal 2.105263 1.368421
Aston Villa 1.687500 1.625000
Bournemouth 1.266667 1.066667
Brighton 1.533333 1.200000
Burnley 1.642857 1.357143
Chelsea 1.900000 1.850000
Crystal Palace 1.142857 0.928571
Everton 1.375000 1.312500
Leicester 1.312500 1.750000
Liverpool 1.857143 1.761905
Man City 2.050000 1.600000
Man Utd 1.421053 0.894737
Newcastle 1.571429 0.785714
Norwich 1.642857 1.357143
Sheff Utd 1.066667 1.066667
Southampton 1.125000 2.187500
Tottenham 1.888889 1.555556
Watford 1.500000 1.125000
West Ham 1.533333 1.466667
Wolves 1.280000 1.440000
</code></pre>
<p>或者</p>
<pre><code>df[df['Away Team'] == 'Leicester'].agg({'Home Score': 'mean', 'Away Score': 'mean'})
Home Score 0.722222
Away Score 2.388889
dtype: float64
</code></pre>
<p>太棒了。DF.T很好,还有一个数据框到sql()如果你走那条路。我希望我的改变能有所帮助,而且我总是很乐意帮助更多人</p>