试图使用dataframe中的HTML表从网站下载PDF？

import pandas as pd dfs = pd.read_html('https://www.asx.com.au/asx/statistics/prevBusDayAnns.do') for df in dfs: df.loc[df['ASX Code'] == 'SPL', 'Match'] = "1" df.loc[df['ASX Code'] != 'SPL', 'Match'] = "0" print(df)

ASX Code Date Price sens. Headline Match 0 SPL 15/04/2020 7:25 PM NaN SPL7013 shows significant activity against cor... 1 1 LSH 15/04/2020 7:19 PM NaN Change of Director's Interest Notice 2 pages... 0 2 PSQ 15/04/2020 7:14 PM NaN PSQ Implements Dividend Reinvestment Plan 25 ... 0 3 TGN 15/04/2020 7:11 PM NaN March Quarterly Report and Appendix 5B 24 pa... 0 4 GRR 15/04/2020 6:49 PM NaN Change of Director's Interest Notice 3 pages... 0

1条回答

网友

1楼 · 发布于 2024-09-28 13:37:35

你需要做两件事

您需要为自动下载pdf设置chrome option

诱导WebdriverWait并等待element_to_be_clickable（）

诱导WebdriverWait并等待窗口，然后切换到窗口，然后单击舔Agreed and proceed

单击该pdf将自动下载到默认下载文件夹

代码：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver

chromeOptions=webdriver.ChromeOptions()
prefs = {"plugins.always_open_pdf_externally": True}
chromeOptions.add_experimental_option("prefs",prefs)
driver=webdriver.Chrome(executable_path="path/to/chromedriver",chrome_options=chromeOptions)
driver.get("https://www.asx.com.au/asx/statistics/prevBusDayAnns.do")
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,"//table//tr//td[text()='SPL']/following-sibling::td[3]/a"))).click()
WebDriverWait(driver,15).until(EC.number_of_windows_to_be(2))
driver.switch_to.window(driver.window_handles[-1])
WebDriverWait(driver,15).until(EC.element_to_be_clickable((By.XPATH,"//input[@value='Agree and proceed']"))).click()

浏览器快照：

相关问题更多 >

编程相关推荐

热门问题

热门文章