硒和旋转容器

2024-09-30 14:21:25 发布

您现在位置:Python中文网/ 问答频道 /正文

有一个带有表格的页面和刷新表格的“下一步”按钮。我现在可以提取表的内容,但需要使用“下一步”按钮移到其他行。这是一个ajax表,没有用于刷新页面的href。所以我被困住了。页面是https://www.whoscored.com/Regions/252/Tournaments/2/Seasons/6335/Stages/13796/PlayerStatistics/England-Premier-League-2016-2017


Tags: httpscom内容wwwajax页面按钮表格
1条回答
网友
1楼 · 发布于 2024-09-30 14:21:25

我会做以下事情:

  • 开始一个无休止的循环
  • 单击next按钮-如果失败-退出循环(这是“中断”条件)
  • 等待表加载包装器不可见
  • 收集玩家数据

示例实现(仅使用selenium,但您可能需要使用BeautifulSoup进行播放器数据解析—应该快得多):

from pprint import pprint

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import ElementNotVisibleException

root = "https://www.whoscored.com/Regions/252/Tournaments/2/Seasons/6335/Stages/13796/PlayerStatistics/England-Premier-League-2016-2017"
driver = webdriver.PhantomJS()
driver.get(root)


wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#statistics-table-summary .player-link")))

# get the first 10 players
players = [player.text for player in driver.find_elements_by_css_selector("#statistics-table-summary .player-link")]

while True:
    try:
        # click Next
        driver.find_element_by_link_text("next").click()
    except ElementNotVisibleException:
        break  # next is not present/visible

    wait.until(EC.invisibility_of_element_located((By.ID, "statistics-table-summary-loading")))

    # collect the next 10 players
    players += [player.text for player in driver.find_elements_by_css_selector("#statistics-table-summary .player-link")]
    print(len(players))

pprint(players)
driver.close()

请注意,就解析而言,为了提高性能,请使用^{}只解析相关表

相关问题 更多 >