抓取下载在线文件python

2024-06-28 14:25:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从该页面中的可下载文件中获取数据:

https://www.abcbourse.com/download/libelles

我需要验证SBF120上的checkbox,然后单击Télécharger

这是我的代码,但我不知道应该添加哪些参数:

import requests

url = "https://www.abcbourse.com/download/libelles"
params = {}

r = requests.get(url, params=params)
data = r.json()

Tags: 文件代码httpscomurldownloadwww页面
3条回答

如果您愿意使用硒,这里有一个简单的指南:
🔰 快速安装selenium:pip3 install selenium
🔰 安装Mozilla Firefox webdriver根据需要选择一个 OS

完成上述步骤后,以下代码应能顺利运行:
(这将下载下载文件夹中的“.csv”文件,✅ 在Mac上测试)

import os, time
from selenium import webdriver

if __name__ == "__main__":

    profile = webdriver.FirefoxProfile()
    profile.set_preference('browser.download.folderList', 1) 
    profile.set_preference('browser.download.manager.showWhenStarting', False)
    profile.set_preference("browser.helperApps.alwaysAsk.force", False)
    profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/csv')

    browser = webdriver.Firefox(profile)
    browser.maximize_window()
    browser.get("https://www.abcbourse.com/download/libelles")    
    time.sleep(8)

    browser.find_element_by_css_selector('button.sd-cmp-1rLJX').click()
    browser.find_element_by_css_selector('input[value="xsbf120p"]').click()
    browser.find_element_by_css_selector('button.btn_abc.ml20').click()
    browser.close()

尝试使用请求模块从该站点下载所需文件:

import requests
from bs4 import BeautifulSoup

link = 'https://www.abcbourse.com/download/libelles'

payload = {
    'cbox': 'xsbf120p',
    'cbPlace': 'false'
}

with requests.Session() as s:
    s.headers['user-agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
    res = s.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    payload['__RequestVerificationToken'] = soup.select_one("input[name='__RequestVerificationToken']")['value']
    with open("libelles.csv","wb") as f:
        f.write(s.post(link,data=payload).content)

POST请求的备选方案,需要使用selenium的令牌

非常基本的示例

from selenium import webdriver
import time

driver = webdriver.Chrome(executable_path=r'C:\Program Files\ChromeDriver\chromedriver.exe')
url = 'https://www.abcbourse.com/download/libelles'
driver.get(url)
time.sleep(5)
# get rid of the cookie popup
driver.find_element_by_css_selector('button.sd-cmp-1rLJX').click()
# click the checkbox
driver.find_element_by_css_selector('input[value="xsbf120p"]').click()
# click submit button
driver.find_element_by_css_selector('button.btn_abc.ml20').click()

driver.close()

相关问题 更多 >