使用python从受密码保护的网站下载文件

2024-09-28 21:42:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我以前总是通过你们庞大的社区来解决我的基本问题,现在我面临一个问题。 我被分配到一个与Andrew's request非常相似的任务,即从手动下载到自动下载,在这里我必须编写一个脚本,通过提供身份验证从EUMETSAT下载数据。请在下面找到我的尝试

import requests
from lxml import html

# EUMETSAT url for authentification

url_EUMETSAT = 'http://oiswww.eumetsat.org/SDDI/webapps/publicdcp/logon.jsp'
username = '<USER>'
password = '<PASS>'

# Authentification attempt

EUMETSAT_request = requests.Session()
EUMETSAT_result = EUMETSAT_request.get(url_EUMETSAT)
EUMETSAT_login = {
    "username" : username,
    "password" : password
}

CONNEXION_result = EUMETSAT_request.post(url_EUMETSAT, data = EUMETSAT_login)
CONNEXION_result.status_code
# 200 means that request has been established

# Download of one data file

url_EUMETSAT_DATABASE ='http://oiswww.eumetsat.org/SDDI/webapps/publicdcp/mainMenuAction.do?action=DCP_ADMIN'
DATA_BASE = EUMETSAT_request.get(url_EUMETSAT_DATABASE)

url_file1 ='http://oiswww.eumetsat.org/SDDI/webapps/publicdcp/dcpAdmin.do?action=ACTION_DOWNLOAD&id=1212D0C2'
DATA_FILE1 = EUMETSAT_request.get(url_file1, headers = dict(referer = url_file1))

# Writing of the content in data.txt

filename = 'data.txt'
data = DATA_FILE1.content
with open(filename,'wb') as open_file:
    open_file.write(data)

通过这个脚本,我希望在data.txt中有我的数据,就像我手动下载它一样。但是,当我打开它时,我有一个带有下面标题的html代码

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
    <head>  
        <!-- MUST include prior other includes-->
<script language="javascript" type="text/javascript">

  /**
  * Environment parameters (MUST BE DEFINED!)
  */
  var EUM_SNIPPETS_CFG = new Array();
  /*Commonly changed params*/
  EUM_SNIPPETS_CFG['titleHigh']       = "Title (not yet customized)";
  EUM_SNIPPETS_CFG['titleSub']        = "Tagline (not yet customized)";

  EUM_SNIPPETS_CFG['displaySearch']        = true;
  EUM_SNIPPETS_CFG['useLocalAssetsPath']   = false;//for isolated assets only
  EUM_SNIPPETS_CFG['searchOpensNewWindow'] = false;
  EUM_SNIPPETS_CFG['pathWebsite']          = "http://www.eumetsat.int/";
  EUM_SNIPPETS_CFG['pathSearch']           = "http://search.eumetsat.int/search";
  EUM_SNIPPETS_CFG['externalAssetsDomain'] = "http://dev75.eumetsat.int";//no slash at the end

  //for localized version only. If absolute path set, this will be overridden to applicable CMS urls
  EUM_SNIPPETS_CFG['path_images']     = "images";//path to image assets
  EUM_SNIPPETS_CFG['path_css']        = "css";//path to CSS assets
  EUM_SNIPPETS_CFG['path_javascript'] = "javascript";//path to JS assets

</script>

        <script language="JavaScript" type="text/javascript">
  /**

其次,当我手动下载文件时,文件名遵循一种模式,即站点、日期、小时或其他名称。从变量url_file1可以看出,文件名没有包含在内。 你能突出我吗?我想有些地方我错过了


Tags: topathorghttpurldatarequestjavascript