连接URL和刮取数据时出现问题

2024-04-25 01:44:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在python中附加一个URL,以便从目标URL中获取详细信息。 我有下面的代码,但它似乎是从url1而不是URL中抓取数据

我从NFL网站上搜刮了球队的名字,没有任何问题。问题在于spotrac URL,我在其中添加了我从NFL网站上抓取的球队名称

import requests
from bs4 import BeautifulSoup   

URL ='https://www.nfl.com/teams/'

page = requests.get(URL)
soup = BeautifulSoup(page.text, 'html.parser')

team_name = []

team_name_list = soup.find_all('h4',class_='d3-o-media-object__roofline nfl-c-custom-promo__headline')
for team in team_name_list:
  if team.find('p'):
      team_name.append(team.text)

for team in team_name: 
        
    team = team.replace(" ", "-").lower()

    url1 = 'https://www.spotrac.com/nfl/rankings/'
    URL = url1 +str(team)
    print(URL)
    data = {
        'ajax': 'true',
        'mobile': 'false'
    }
    
    bs_soup = BeautifulSoup(requests.post(URL, data=data).content, 'html.parser')
    spotrac_df = pd.DataFrame(columns = ['Name', 'Salary']) 
    
    for h3 in bs_soup.select('h3'):
        spotrac_df = spotrac_df.append(pd.DataFrame({'Name': str(h3.text), 'Salary' : str(h3.find_next(class_="rank-value").text)}, index=[0]), ignore_index=False)

我几乎可以肯定的是,问题来自URL没有正确附加。刮取是从url1而不是URL获取工资等

我的控制台输出(使用Spyder IDE)如下所示,用于打印(URL)
enter image description here


Tags: textnameinurlforfindrequeststeam
1条回答
网友
1楼 · 发布于 2024-04-25 01:44:22

url添加正确,但您的团队名称中有一个前导空格。我还做了一些其他的更改,并在代码中记录了它们

最后,(我曾经做过两次),创建一个空数据帧,然后在每次迭代后附加到它,我认为这不是最好的方法。我被告知最好使用列表/字典构造行,然后在完成后,调用pandas来构造数据帧,因此也改变了这一点

import requests
from bs4 import BeautifulSoup   
import pandas as pd

url ='https://www.nfl.com/teams/'

page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')

team_name = []

team_name_list = soup.find_all('h4',class_='d3-o-media-object__roofline nfl-c-custom-promo__headline')
for team in team_name_list:
  if team.find('p'):
      team_name.append(team.text.strip()) #<- remove leading/trailing white space

url1 = 'https://www.spotrac.com/nfl/rankings/' #<- since this is fixed, put it before the loop
spotrac_rows = []
for team in team_name: 
        
    team = '-'.join(team.split()).lower() #<- changed to split in case theres 2 spaces between city and team

    url1 = 'https://www.spotrac.com/nfl/rankings/'
    url = url1 + str(team)
    print(url)
    data = {
        'ajax': 'true',
        'mobile': 'false'
    }
    
    bs_soup = BeautifulSoup(requests.post(url, data=data).content, 'html.parser')
    
    for h3 in bs_soup.select('h3'):
        spotrac_rows.append({'Name': str(h3.text), 'Salary' : str(h3.find_next(class_="rank-value").text.strip())})  #<- remove white space from the salary
        
spotrac_df = pd.DataFrame(spotrac_rows)

输出:

print(spotrac_df)
                       Name       Salary
0            Chandler Jones  $21,333,333
1          Patrick Peterson  $13,184,588
2            D.J. Humphries  $12,800,000
3           DeAndre Hopkins  $12,500,000
4          Larry Fitzgerald  $11,750,000
5              Jordan Hicks  $10,500,000
6               Justin Pugh  $10,500,000
7              Kenyan Drake   $8,483,000
8              Kyler Murray   $8,080,601
9             Robert Alford   $7,500,000
10              J.R. Sweezy   $6,500,000
11             Corey Peters   $4,437,500
12           Haason Reddick   $4,288,444
13          Jordan Phillips   $4,000,000
14           Isaiah Simmons   $3,757,101
15            Maxx Williams   $3,400,000
16            Zane Gonzalez   $3,259,000
17            Devon Kennard   $2,500,000
18              Budda Baker   $2,173,184
19       De'Vondre Campbell   $2,000,000
20                 Andy Lee   $2,000,000
21             Byron Murphy   $1,815,795
22           Christian Kirk   $1,607,691
23             Aaron Brewer   $1,168,750
24               Max Garcia   $1,143,125
25            Andy Isabella   $1,052,244
26               Mason Cole     $977,629
27               Zach Allen     $975,855
28              Chris Banjo     $887,500
29         Jonathan Bullard     $887,500
                    ...          ...
2530       Khari Blasingame     $675,000
2531         Kenneth Durden     $675,000
2532         Cody Hollister     $675,000
2533              Joey Ivie     $675,000
2534            Greg Joseph     $675,000
2535             Kareem Orr     $675,000
2536     David Quessenberry     $675,000
2537        Derick Roberson     $675,000
2538           Shaun Wilson     $675,000
2539          Cole McDonald     $635,421
2540          Chris Jackson     $629,570
2541             Kobe Smith     $614,333
2542           Aaron Brewer     $613,333
2543           Cale Garrett     $613,333
2544           Tommy Hudson     $613,333
2545     Kristian Wilkerson     $613,333
2546  Khaylan Kearse-Thomas     $612,500
2547         Nick Westbrook     $612,333
2548          Kyle Williams     $611,833
2549           Mason Kinsey     $611,666
2550          Tucker McCann     $611,666
2551       Cameron Scarlett     $611,666
2552             Teair Tart     $611,666
2553           Brandon Kemp     $611,333
2554              Wyatt Ray     $610,000
2555             Josh Smith     $610,000
2556         Logan Woodside     $610,000
2557          Rashard Davis     $610,000
2558          Avery Gennesy     $610,000
2559           Parker Hesse     $610,000

[2560 rows x 2 columns]

相关问题 更多 >