selenium python在tab中找不到隐藏元素

2024-09-27 19:19:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下代码,我试图从这个MLB网站(http://www.espn.com/mlb/boxscore?gameId=370403101)获取玩家统计信息:

from selenium import webdriver


link = 'http://www.espn.com/mlb/boxscore?gameId=370403101'
driver = webdriver.Chrome('/PATH/chromedriver')
driver.get(link)

player_name_away = driver.find_element_by_xpath('//*[@id="gamepackage-box-score"]/div/div[2]/div[1]/article[1]/div/table[1]/tbody[1]/tr/td[1]/a/span').text
ab_away = driver.find_element_by_xpath('//*[@id="gamepackage-box-score"]/div/div[2]/div[1]/article[1]/div/table[1]/tbody[1]/tr/td[3]').text
run_away = driver.find_element_by_xpath('//*[@id="gamepackage-box-score"]/div/div[2]/div[1]/article[1]/div/table[1]/tbody[1]/tr/td[4]').text
hit_away = driver.find_element_by_xpath('//*[@id="gamepackage-box-score"]/div/div[2]/div[1]/article[1]/div/table[1]/tbody[1]/tr/td[5]').text
rbi_away = driver.find_element_by_xpath('//*[@id="gamepackage-box-score"]/div/div[2]/div[1]/article[1]/div/table[1]/tbody[1]/tr/td[6]').text
bb_away = driver.find_element_by_xpath('//*[@id="gamepackage-box-score"]/div/div[2]/div[1]/article[1]/div/table[1]/tbody[1]/tr/td[7]').text
strk_away = driver.find_element_by_xpath('//*[@id="gamepackage-box-score"]/div/div[2]/div[1]/article[1]/div/table[1]/tbody[1]/tr/td[8]').text
p_val_away = driver.find_element_by_xpath('//*[@id="gamepackage-box-score"]/div/div[2]/div[1]/article[1]/div/table[1]/tbody[1]/tr/td[9]').text
avg_away = driver.find_element_by_xpath('//*[@id="gamepackage-box-score"]/div/div[2]/div[1]/article[1]/div/table[1]/tbody[1]/tr/td[10]').text
obp_away = driver.find_element_by_xpath('//*[@id="gamepackage-box-score"]/div/div[2]/div[1]/article[1]/div/table[1]/tbody[1]/tr/td[11]').text
slg_away = driver.find_element_by_xpath('//*[@id="gamepackage-box-score"]/div/div[2]/div[1]/article[1]/div/table[1]/tbody[1]/tr/td[12]').text

driver.close()

val_list_away = [player_name_away, ab_away, run_away, hit_away, rbi_away, bb_away, strk_away,
                 p_val_away, avg_away, obp_away, slg_away]

print(val_list_away)

但是,当我运行代码时,我会得到以下列表:

^{pr2}$

^{cd1>}、^{cd2>}、^{cd3>}和^{{cd4>}的值丢失,但是,如下图所示,硒应该可以访问html代码。有人能帮忙吗?谢谢您!

enter image description here


Tags: divboxidbydriverarticletableelement
2条回答

假设未来您可能需要从几个表中提取几个玩家的统计信息,我对您的程序进行了如下修改:

  • 代码块:

    from selenium import webdriver
    
    link = 'http://www.espn.com/mlb/boxscore?gameId=370403101'
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_argument('disable-infobars')
    driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get("http://www.google.com")
    driver.get(link)
    item_name_away = driver.find_element_by_xpath("//div[@class='boxscore-2017__team-name' and contains(.,'Blue Jays Hitting')]//following::table[1]/thead//th[@class='name']").text
    player_name_away = driver.find_element_by_xpath("//div[@class='boxscore-2017__team-name' and contains(.,'Blue Jays Hitting')]//following::table[1]/tbody//td//span").text
    print("%s : %s" %(item_name_away, player_name_away)) 
    attributes = driver.find_elements_by_xpath("//div[@class='boxscore-2017__team-name' and contains(.,'Blue Jays Hitting')]//following::table[1]/thead//th[starts-with(@class,'batting-stats-')]")
    values = driver.find_elements_by_xpath("//div[@class='boxscore-2017__team-name' and contains(.,'Blue Jays Hitting')]//following::table[1]/tbody[@class='athletes' and @data-athlete-id='32938']//tr[@class='baseball-lineup__player-row']/td[starts-with(@class,'batting-stats-')]")
    for attribute, value in zip(attributes, values):
        print(attribute.text, value.text)
    
  • 控制台输出:

    ^{2美元

注意:在UI中,列标题p及其值16都无法提取。在

您可以使用selenium来加载页面,然后使用BeautifulSoup来查找播放器属性:

from selenium import webdriver
from bs4 import BeautifulSoup as soup
import re
import collections
player = collections.namedtuple('player', ['name', 'position', 'stats'])
d = webdriver.Chrome('/Users/jamespetullo/Downloads/chromedriver')
d.get('http://www.espn.com/mlb/boxscore?gameId=370403101')
player_names = iter([b.text for b in soup(d.page_source, 'lxml').find_all('td', {'class':'name'})])
full_stats = [i.text for i in h.find_all('td', {'class':re.compile('batting-stats')})]
final_results = {next(player_names):full_stats[i:i+11] for i in range(0, len(full_stats), 11)}
final_players = [player(*[re.sub('[A-Z\d\-\s\(\),]+$', '', a), (lambda x:'N/A' if not x else x[0])(re.findall('[A-Z\d\-\s\(\),]+$', a)), b]) for a, b in final_results.items()]

输出:

^{pr2}$

结果还生成"D. Travis"的完整统计信息:

[u'2-6', u'6', u'0', u'2', u'0', u'0', u'2', u'16', u'.333', u'.333', u'.333']

相关问题 更多 >

    热门问题