下载.xls文件会导致“urllib.error.HTTPError：HTTP错误404:找不到“

2024-06-26 10:18:48 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试使用BeautifulSoup来刮取.xls表，这些表可以从Xcel Energy的网站（https://www.xcelenergy.com/working_with_us/municipalities/community_energy_reports）下载。你知道吗

此函数获取表的URL链接并尝试下载它们：

url = 'https://www.xcelenergy.com/working_with_us/municipalities/community_energy_reports'
dir = 'C:/Users/aobrien/PycharmProjects/xceldatascraper/'
def scraper(page):
    from bs4 import BeautifulSoup as bs
    import urllib.request
    import requests
    import os
    import re
    tld = r'https://www.xcelenergy.com'
    pageobj = requests.get(page, verify=False)
    sp = bs(pageobj.content, 'html.parser')
    xlst, fnms = [], []
    links = [a['href'] for a in sp.find_all('a', attrs={'href': re.compile("/staticfiles/")})]
    for idx, a in enumerate(links):
        if a.endswith('.xls'):
            furl = tld + str(a)
            xlst.append(furl)
            fnms.append(a.split('/')[4])
    naur = zip(fnms, xlst)
    if not os.path.exists(dir + 'tables'):
        os.makedirs(dir + 'tables')
    for name, url in naur:
        print(url)
        res = urllib.request.urlopen(url)
        xls = open(dir + 'tables/' + name, 'wb')
        xls.write(res.read())
        xls.close()
scraper(url)

脚本失败时urllib.request.urlopen（url）尝试访问文件，返回“urllib.error.HTTPError：HTTP错误404：找不到“。“print（url）”语句打印脚本构造的url（https://www.xcelenergy.com/staticfiles/xe-responsive/WorkingWith Us/MI-City-Forest-Lake-2016.xls），手动将该url粘贴到浏览器中，即可下载文件。你知道吗

我错过了什么？你知道吗

Tags： in https import com url for os request

0条回答

目前没有回答

下载.xls文件会导致“urllib.error.HTTPError：HTTP错误404:找不到“

相关问题更多 >

编程相关推荐

热门问题

热门文章

下载.xls文件会导致“urllib.error.HTTPError：HTTP错误404:找不到“

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >