在访问Python中的第一个元素后,无法通过循环中的XPath访问其余元素

2024-10-06 11:19:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从sciencedirect网站上搜集数据。 我试图通过创建一个XPath列表并循环访问一个接一个的日志问题来自动化刮取过程。 当im运行循环时,在访问第一个日志后,im无法访问其余元素。 这个过程在另一个网站上对我有效,但在这个网站上不起作用

我还想知道,除了这个过程之外,还有没有更好的方法访问这些元素

#Importing libraries
 import requests
 import os
 import json
 from selenium import webdriver
 import pandas as pd
 from bs4 import BeautifulSoup  
 import time
 import requests
 from time import sleep

 from selenium.webdriver.common.by import By
 from selenium.webdriver.support.ui import WebDriverWait
 from selenium.webdriver.support import expected_conditions as EC

 #initializing the chromewebdriver|
 driver=webdriver.Chrome(executable_path=r"C:/selenium/chromedriver.exe")

 #website to be accessed
 driver.get("https://www.sciencedirect.com/journal/journal-of-corporate-finance/issues")

 #generating the list of xpaths to be accessed one after the other
 issues=[]
 for i in range(0,20):
     docs=(str(i))
     for j in range(1,7):
         sets=(str(j))
         con=("//*[@id=")+('"')+("0-accordion-panel-")+(docs)+('"')+("]/section/div[")+(sets)+("]/a")
         issues.append(con)

 #looping to access one issue after the other
 for i in issues:
     try:
         hat=driver.find_element_by_xpath(i)
         hat.click()
         sleep(4)
         driver.back()
     except:
         print("no more issues",i)

enter image description here


Tags: thetoinfromimport元素for网站
1条回答
网友
1楼 · 发布于 2024-10-06 11:19:40

要从sciencedirect网站https://www.sciencedirect.com/journal/journal-of-corporate-finance/issues中获取数据,可以执行以下步骤:

  • 首先打开所有的手风琴

  • 然后在adjustant TAB中使用Ctrl+click()打开每个问题

  • 下一步^{} the newly opened tab并刮取所需内容

  • 代码块:

      from selenium import webdriver
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.support import expected_conditions as EC
      from selenium.webdriver.common.action_chains import ActionChains
      from selenium.webdriver.common.keys import Keys
    
      options = webdriver.ChromeOptions() 
      options.add_argument("start-maximized")
      options.add_experimental_option("excludeSwitches", ["enable-automation"])
      options.add_experimental_option('useAutomationExtension', False)
      driver = webdriver.Chrome(options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
      driver.get('https://www.sciencedirect.com/journal/journal-of-corporate-finance/issues')
      accordions = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "li.accordion-panel.js-accordion-panel>button.accordion-panel-title>span")))
      for accordion in accordions:
          ActionChains(driver).move_to_element(accordion).click(accordion).perform()
      issues = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.anchor.js-issue-item-link.text-m span.anchor-text")))
      windows_before  = driver.current_window_handle
      for issue in issues:
          ActionChains(driver).key_down(Keys.CONTROL).click(issue).key_up(Keys.CONTROL).perform()
          WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2))
          windows_after = driver.window_handles
          new_window = [x for x in windows_after if x != windows_before][0]
          driver.switch_to_window(new_window)
          WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a#journal-title>span")))
          print(WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.XPATH, "//h2"))).get_attribute("innerHTML"))
          driver.close()
          driver.switch_to_window(windows_before)
      driver.quit()
    
  • 控制台输出:

      Institutions, Governance and Finance in a Globally Connected Environment
      Volume 58
      Corporate Governance in Multinational Enterprises
      .
      .
      .
    

参考资料

您可以在以下内容中找到一些相关的详细讨论:

相关问题 更多 >