市场的刮削功能

2024-10-06 07:57:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在编写一个函数,该函数将进入TeamTransferMarket页面,并将每年从该Page表中获取所有数据

我遇到的问题有两个。 1-指数13是市场价值,表示指数超出范围,但如果打印打印长度('colomun:',len(all_td)),则得到13,即las列

2-我得到了五倍的成绩,甚至超过了5倍。同一个球员我知道我可以做两次,但我不想做

我是这个领域的新手,这是我的课程,我被困在这里

谢谢你的帮助

import requests
from bs4 import BeautifulSoup
import pandas as pd
from google.colab import drive
drive.mount('/content/drive')

data_CORIN = {
'name': [],
'field_position': [],
'date_of_birth': [],
'height': [],
'foot': [],
'market_value': [],
'anio': []
}

headers = {
   'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) 
  Chrome/47.0.2526.106 Safari/537.36'}

l = [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]

for i in range(0,len(l)-1):
  url = "https://www.transfermarkt.es/sport-club-corinthians-paulista/kader/verein/199/saison_id/{}/plus/1".format(l[i])
  response = requests.get(url, headers=headers)
  soup = BeautifulSoup(response.content, 'html.parser')

all_tr = soup.find_all('tr', {'class': ['odd', 'even']}, recursive=True)
print('rows:', len(all_tr))

  for row in all_tr:
  all_td = row.find_all('td', recursive=True)
  print('columns:', len(all_td))
  
   for column in all_td:
    print(' >', column.text)

  data_CORIN['name'].append( all_td[3].text.split('.')[0][:15])
  data_CORIN['field_position'].append( all_td[4].text)
  data_CORIN['date_of_birth'].append( all_td[5].text[12:14])
  data_CORIN['height'].append( all_td[8].text )
  data_CORIN['foot'].append( all_td[9].text )
  data_CORIN['market_value'].append( all_td[12].text )
  data_CORIN['anio'].append(l[i]) 

df = pd.DataFrame(data_CORIN)
print(df.head())

Tags: 函数textinimportfordatalendrive
1条回答
网友
1楼 · 发布于 2024-10-06 07:57:21

此脚本将从2011年到2020年,并将所有详细信息保存到data.csv文件中:

import requests
import pandas as pd
from bs4 import BeautifulSoup


headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}

all_data = []
for year in range(2011, 2021):
    print('Getting data for year {}..'.format(year))

    url = 'https://www.transfermarkt.es/sport-club-corinthians-paulista/kader/verein/199/plus/1/galerie/0?saison_id=' + str(year)
    soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')

    th = soup.select('.items th')
    if len(th) == 11:
        for td in soup.select('.items > tbody > tr > td:nth-child(5)'):
            td.extract()

    for tr in soup.select('.items > tbody > tr:has(td)'):
        name = tr.select_one('a[id]').get_text(strip=True)
        field_position = tr.select_one('table > tr:nth-child(2)').text

        dob = tr.select_one('td:nth-child(3)').text
        height = tr.select_one('td:nth-child(5)').text
        foot = tr.select_one('td:nth-child(6)').text
        mv = tr.select_one('td:nth-child(10)').text

        all_data.append({
            'Name': name,
            'Field Position': field_position,
            'Height': height,
            'Date of Birth': dob,
            'Foot': foot,
            'Market Value': mv,
            'Year': year
        })

df = pd.DataFrame(all_data)
print(df)
df.to_csv('data.csv')

印刷品:

Getting data for year 2011..
Getting data for year 2012..
Getting data for year 2013..
Getting data for year 2014..
Getting data for year 2015..
Getting data for year 2016..
Getting data for year 2017..
Getting data for year 2018..
Getting data for year 2019..
Getting data for year 2020..
                 Name    Field Position  Height    Date of Birth       Foot    Market Value  Year
0         Júlio César           Portero  1,85 m  27/10/1984 (26)  izquierdo  2,50 mill. €    2011
1              Cássio           Portero  1,95 m  06/06/1987 (24)    derecho  1,00 mill. €    2011
2    Danilo Fernandes           Portero  1,89 m  03/04/1988 (23)    derecho   200 miles €    2011
3     Matheus Vidotto           Portero  1,89 m  10/04/1993 (18)    derecho   100 miles €    2011
4      Leandro Castán   Defensa central  1,86 m  05/11/1986 (24)  izquierdo  2,50 mill. €    2011
..                ...               ...     ...              ...        ...             ...   ...
424   Gabriel Pereira   Extremo derecho  1,75 m  01/08/2001 (18)  izquierdo   675 miles €    2020
425              Luan        Mediapunta  1,80 m  27/03/1993 (27)    derecho  6,50 mill. €    2020
426                Jô  Delantero centro  1,92 m  20/03/1987 (33)  izquierdo  2,50 mill. €    2020
427     Mauro Boselli  Delantero centro  1,85 m  22/05/1985 (35)    derecho  1,20 mill. €    2020
428         Carlinhos  Delantero centro  1,95 m  12/02/1997 (23)    derecho    50 miles €    2020

[429 rows x 7 columns]

生成此CSV(LibreOffice的屏幕截图):

enter image description here

相关问题 更多 >