试图使用dataframe中的HTML表从网站下载PDF?

2024-09-28 13:37:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我已经从https://www.asx.com.au/asx/statistics/prevBusDayAnns.do中删除了HTML表。我已将此表放入熊猫数据框中。我还为dataframe创建了另一个名为“Match”的列,如果“ASX代码”=“SPL”,它将显示一个1。如果你看看这个网站,你会发现标题是可以下载的PDF文件的标题。如果“匹配”列=1,我想下载该文件。这可能吗?硒

我的代码:

import pandas as pd
dfs = pd.read_html('https://www.asx.com.au/asx/statistics/prevBusDayAnns.do')
for df in dfs:
    df.loc[df['ASX Code'] == 'SPL', 'Match'] = "1"
    df.loc[df['ASX Code'] != 'SPL', 'Match'] = "0"
    print(df)

数据帧:

    ASX Code                 Date  Price sens.                                           Headline Match
0        SPL  15/04/2020  7:25 PM          NaN  SPL7013 shows significant activity against cor...     1
1        LSH  15/04/2020  7:19 PM          NaN  Change of Director's Interest Notice  2  pages...     0
2        PSQ  15/04/2020  7:14 PM          NaN  PSQ Implements Dividend Reinvestment Plan  25 ...     0
3        TGN  15/04/2020  7:11 PM          NaN  March Quarterly Report and Appendix 5B  24  pa...     0
4        GRR  15/04/2020  6:49 PM          NaN  Change of Director's Interest Notice  3  pages...     0

Tags: httpscomdfwwwmatchcodenando
1条回答
网友
1楼 · 发布于 2024-09-28 13:37:35

你需要做两件事

您需要为自动下载pdf设置chrome option

诱导WebdriverWait并等待element_to_be_clickable()

诱导WebdriverWait并等待窗口,然后切换到窗口,然后单击舔Agreed and proceed

单击该pdf将自动下载到默认下载文件夹

代码

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver

chromeOptions=webdriver.ChromeOptions()
prefs = {"plugins.always_open_pdf_externally": True}
chromeOptions.add_experimental_option("prefs",prefs)
driver=webdriver.Chrome(executable_path="path/to/chromedriver",chrome_options=chromeOptions)
driver.get("https://www.asx.com.au/asx/statistics/prevBusDayAnns.do")
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,"//table//tr//td[text()='SPL']/following-sibling::td[3]/a"))).click()
WebDriverWait(driver,15).until(EC.number_of_windows_to_be(2))
driver.switch_to.window(driver.window_handles[-1])
WebDriverWait(driver,15).until(EC.element_to_be_clickable((By.XPATH,"//input[@value='Agree and proceed']"))).click()

浏览器快照:

enter image description here

相关问题 更多 >

    热门问题