如何在Python中使用Selenium Webdriver提取总搜索结果?

2024-10-17 06:31:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图使用selenium webdriver从给定搜索结果URL的IEEE Xplore搜索中提取搜索结果计数。 我没有从下面的代码中得到任何错误,但我不确定如何从这里开始

感兴趣的网站元素: Website Element of Interest

元件检查结果: Element Inspection Results

url = 'https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=web%20scraping'
chrome_driver_path = '\\xxxx\chromedriver.exe'
driver.get(url)
wait.until(presence_of_element_located((By.CLASS_NAME, "strong")))
#result = driver.??????
print(result)
driver.close()

Tags: 代码url元素网站driverselenium错误result
2条回答

要打印搜索结果的数量,即184,您可以使用以下任何一种Locator Strategies

  • 使用css_selectorget_attribute("innerHTML")

    driver.get("https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=web%20scraping")
    print(driver.find_element(By.CSS_SELECTOR, "div.Dashboard-header span span:nth-of-type(2) ").get_attribute("innerHTML"))
    
  • 使用xpath文本属性:

    driver.get("https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=web%20scraping")
    print(driver.find_element(By.XPATH, "//div[contains(@class, 'Dashboard-header')]//span//following::span[2]").text)
    

理想情况下,您需要为visibility_of_element_located()诱导WebDriverWait,并且您可以使用以下任一Locator Strategies

  • 使用CSS_SELECTOR文本属性:

    driver.get("https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=web%20scraping")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.Dashboard-header span span:nth-of-type(2)"))).text)
    
  • 使用XPATHget_attribute("innerHTML")

    driver.get("https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=web%20scraping")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'Dashboard-header')]//span//following::span[2]"))).get_attribute("innerHTML"))
    
  • 控制台输出:

    184
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python


参考资料

链接到有用的文档:

正如dukkee提到的,请检查api,但要回答您的问题,您可以选择如下选项:

soup.select('div.Dashboard-header.col-12 > span span')[1].get_text()

找到具有唯一class的父div,然后转到span

示例

from selenium import webdriver
from bs4 import BeautifulSoup
import time

url = 'https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=web%20scraping'
driver = webdriver.Chrome('C:\Program Files\ChromeDriver\chromedriver.exe')
driver.get(url)
time.sleep(3)

html = driver.page_source
soup = BeautifulSoup(html,'html.parser')
print(soup.select('div.Dashboard-header.col-12 > span span')[1].get_text())

driver.quit()

相关问题 更多 >