使用查询字符串从URL下载文件的Python请求抛出ProxyError HTTPSConnectionPool,无法连接到代理,没有此类文件或目录

2024-09-30 01:24:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图使用Python 3.8和Requests包从RKI website in Germany下载excel文件,但出现以下错误:

ProxyError: HTTPSConnectionPool(host='www.rki.de', port=443): Max retries exceeded with url: /DE/Content/InfAZ/N/Neuartiges_Coronavirus/Daten/Fallzahlen_Kum_Tab.xlsx?__blob=publicationFile (Caused by ProxyError('Cannot connect to proxy.', FileNotFoundError(2, 'No such file or directory')))

The link is correct,并在单击时工作

我的代码是:

import requests
resp = requests.get(r"https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Daten/Fallzahlen_Kum_Tab.xlsx",
                    verify=False,
                    params = {"__blob": "publicationFile"})

RKI.de/Robots.txt看起来像:

User-agent: * Disallow: /SharedDocs/Personen/ Disallow: /SharedDocs/Kontaktdaten/ Disallow: /SharedDocs/Kontaktformulare/ Disallow: /SiteGlobals/ Disallow: /DE/Service/ Disallow: /EN/Service/ Allow: /SiteGlobals/Functions/JavaScript/ Allow: /SiteGlobals/StyleBundles/ Allow: /SiteGlobals/Frontend/ Crawl-delay: 10

我对这类事情没有经验。也许这是不可能的


Tags: wwwdecontentallowdisallowproxyerrordatencoronavirus
2条回答

它对我有用,可能有用

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Language': 'ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'TE': 'Trailers',
}

params = (
    ('__blob', 'publicationFile'),
)

response = requests.get(
        'https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Date/Fallzahlen_Kum_Tab.xlsx', 
        headers=headers,
        params=params
)

其他:

with open('file.xlsx', 'wb') as f:
    f.write(response.content)

代理:

授权取决于代理服务器

proxies = { 
    'https' : 'https://user:password@proxyip:port' 
}

response = requests.get(
        'https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Date/Fallzahlen_Kum_Tab.xlsx', 
        headers=headers,
        params=params,
        proxies=proxies
)

试试这个代码

import requests
# Get a copy of the default headers that requests would use
headers = requests.utils.default_headers()
resp = requests.get(r"https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Daten/Fallzahlen_Kum_Tab.xlsx",
                    params = {"__blob": "publicationFile"}, headers=headers)
with open('demo.xlsx','wb') as f:
    f.write(resp.content)

相关问题 更多 >

    热门问题