试图从那些为足球运动员提供奖学金的学校中捞一把,但遇到了一些问题

2024-06-28 20:30:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直试图从那些向高中足球运动员提供奖学金的学校中挑出竞争对手,但我遇到了一些问题

下面是一个示例页面:https://n.rivals.com/content/prospects/2021/de-javion-stepney-235539#school-interests

一旦扩展表,我就可以删除所有学校的名称,但我只想删除与学校在同一行中有学校提供复选标记的学校。我该怎么做

此外,虽然我能够刮取学校名称,但在进入下一个播放器页面之前,它经常会重复随机行,我不知道为什么

以下是我到目前为止的情况:

Offered_By_List = []

for s in driver.find_elements_by_class_name('school-logo-name'):
    Offered_By_List.append(s)

任何帮助都将不胜感激,我在这上面停留了一段时间


Tags: namehttpscom示例by页面学校list
3条回答

使用ancestor,这只是为了刮取学校名称:

driver.find_elements_by_xpath('//div[@class="checkmark ng-scope"]//ancestor::tr//div[@class="school-logo-name"]')

但是,如果您想刮取每行的所有数据,只需删除上面xpath中的//div[@class="school-logo-name"]

我不会在这里使用Selenium,因为数据在html中作为元素属性中的有效json返回。有几种方法可以提取学校名称,但我用pandas做了,因为你也可以将其放入表格中,如果你想要更多的数据而不仅仅是学校名称,你可以随心所欲地操作它。我还抓到了球员的个人资料:

import requests
from bs4 import BeautifulSoup
import json
import pandas as pd

url = 'https://n.rivals.com/content/prospects/2021/de-javion-stepney-235539#school-interests'


response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

jsonStr = soup.find('rv-user-forecast-banner')['prospect']
playerData = json.loads(jsonStr)
df1 = pd.DataFrame(playerData)
print (df1)



jsonStr = soup.find('rv-prospect-school-interests')['data']
schoolIntData = json.loads(jsonStr)
df2 = pd.DataFrame(schoolIntData)
print (df2)

schoolsOffered = df2[df2['offer'] == True]
Offered_By_List = list(schoolsOffered['team_name'])

输出:

print(df2.to_string())
    college_id  commit commit_date                                    commitments_url      id interest  offer recruiters   sign  site_id        site_name                                          team_logo         team_name visits
0           51    True  2020-04-27  //centralmichigan.rivals.com/commitments/footb...  779864     HIGH   True         []  False     21.0  centralmichigan  https://s.yimg.com/xe/ipt/CentralMichiganChipp...  Central Michigan     []
1           48   False        None                                               None  794033     NONE   True         []  False      NaN             None  https://s.yimg.com/dh/ap/default/151102/akron-...             Akron     []
2           49   False        None                                               None  850677     NONE   True         []  False      NaN             None  https://s.yimg.com/cv/ae/default/170622/Ball-S...        Ball State     []
3           11   False        None  //bostoncollege.rivals.com/commitments/footbal...  817746     NONE   True         []  False     15.0    bostoncollege  https://s.yimg.com/xe/i/us/sp/v/ncaaf/teams/20...    Boston College     []
4           50   False        None                                               None  835972     NONE   True         []  False      NaN             None  https://s.yimg.com/xe/i/us/sp/v/ncaaf/teams/20...     Bowling Green     []
5          209   False        None                                               None  835973     NONE   True         []  False      NaN             None  https://sp.yimg.com/j/assets/ipt/BuffaloBulls.png           Buffalo     []
6           99   False        None  //cincinnati.rivals.com/commitments/football/2021  833595     NONE   True         []  False     23.0       cincinnati             https://s.yimg.com/xe/ipt/CINC_300.png        Cincinnati     []
7           28   False        None     //Indiana.rivals.com/commitments/football/2021  825797     NONE   True         []  False     51.0          Indiana  https://s.yimg.com/xe/i/us/sp/v/ncaaf/teams/20...           Indiana     []
8           20   False        None   //iowastate.rivals.com/commitments/football/2021  836116     NONE   True         []  False     55.0        iowastate  https://s.yimg.com/dh/ap/default/151102/IowaSt...        Iowa State     []
9           53   False        None   //kentstate.rivals.com/commitments/football/2021  850678     NONE   True         []  False     62.0        kentstate  https://s.yimg.com/xe/i/us/sp/v/ncaaf/teams/20...        Kent State     []
10          54   False        None                                               None  850676     NONE   True         []  False      NaN             None  https://s.yimg.com/cv/ae/default/170623/Miami-...        Miami (OH)     []
11          15   False        None    //syracuse.rivals.com/commitments/football/2021  804815     NONE   True         []  False    133.0         syracuse  https://s.yimg.com/xe/i/us/sp/v/ncaaf/teams/20...          Syracuse     []
12          16   False        None      //temple.rivals.com/commitments/football/2021  826633     NONE   True         []  False    135.0           temple  https://s.yimg.com/xe/i/us/sp/v/ncaaf/teams/20...            Temple     []
13          56   False        None      //toledo.rivals.com/commitments/football/2021  783815     NONE   True         []  False    144.0           toledo  https://s.yimg.com/dh/ap/default/160427/Toledo...            Toledo     []
14          57   False        None  //westernmichigan.rivals.com/commitments/footb...  783814     NONE   True         []  False    168.0  westernmichigan  https://s.yimg.com/dh/ap/default/170213/wm_nca...  Western Michigan     []
15          27   False        None    //Illinois.rivals.com/commitments/football/2021  783816     NONE  False         []  False     49.0         Illinois         https://s.yimg.com/xe/ipt/illinois_300.png          Illinois     []
16          72   False        None   //Tennessee.rivals.com/commitments/football/2021  856404     NONE  False         []  False    136.0        Tennessee  https://s.yimg.com/dh/ap/default/151102/Tennes...         Tennessee     []
17          63   False        None         //USC.rivals.com/commitments/football/2021  856403     NONE  False         []  False    151.0              USC  https://s.yimg.com/xe/i/us/sp/v/ncaaf/teams/20...               USC     []

仅此列表:

print(Offered_By_List)
['Central Michigan', 'Akron', 'Ball State', 'Boston College', 'Bowling Green', 'Buffalo', 'Cincinnati', 'Indiana', 'Iowa State', 'Kent State', 'Miami (OH)', 'Syracuse', 'Temple', 'Toledo', 'Western Michigan']

您可以使用xpath实现表中的复选标记和行之间的关系,下面来自示例的xpath示例将获取具有复选标记的行。您会注意到,这个xpath只选择了带有复选标记的行(本页中为15)。然后将其另存为数组,遍历所有行并保存学校名称

//tbody/tr[td[5]/div[@class="checkmark ng-scope"]]

enter image description here

或者直接使用下面的代码

list = browser.find_elements_by_xpath("//tbody/tr[td[5]/div[@class="checkmark ng-scope"]]/td[1]/div/*[@class="ng-binding ng-scope"]")
for s in list:
   print(s.text) 

相关问题 更多 >