我已经从https://www.asx.com.au/asx/statistics/prevBusDayAnns.do中删除了HTML表。我已将此表放入熊猫数据框中。我还为dataframe创建了另一个名为“Match”的列,如果“ASX代码”=“SPL”,它将显示一个1。如果你看看这个网站,你会发现标题是可以下载的PDF文件的标题。如果“匹配”列=1,我想下载该文件。这可能吗?硒
我的代码:
import pandas as pd
dfs = pd.read_html('https://www.asx.com.au/asx/statistics/prevBusDayAnns.do')
for df in dfs:
df.loc[df['ASX Code'] == 'SPL', 'Match'] = "1"
df.loc[df['ASX Code'] != 'SPL', 'Match'] = "0"
print(df)
数据帧:
ASX Code Date Price sens. Headline Match
0 SPL 15/04/2020 7:25 PM NaN SPL7013 shows significant activity against cor... 1
1 LSH 15/04/2020 7:19 PM NaN Change of Director's Interest Notice 2 pages... 0
2 PSQ 15/04/2020 7:14 PM NaN PSQ Implements Dividend Reinvestment Plan 25 ... 0
3 TGN 15/04/2020 7:11 PM NaN March Quarterly Report and Appendix 5B 24 pa... 0
4 GRR 15/04/2020 6:49 PM NaN Change of Director's Interest Notice 3 pages... 0
你需要做两件事
您需要为自动下载pdf设置
chrome option
诱导
WebdriverWait
并等待element_to_be_clickable
()诱导
WebdriverWait
并等待窗口,然后切换到窗口,然后单击舔Agreed and proceed
单击该pdf将自动下载到默认下载文件夹
代码:
浏览器快照:
相关问题 更多 >
编程相关推荐