如何在删除文件的href后下载该文件，该文件中没有http或https

<a href="/census-recensement/2016/dp-pd/prof/details/download- telecharger/comp/GetFile.cfm?Lang=E&FILETYPE=CSV&GEONO=059" title="Canada, provinces and territories – File format CSV" class="btn btn-default btn-block"></a>

for i, r in enumerate(rows): href = r.find('a', href=True) remote_file = requests.get(href['href']) with open(href['href'], 'wb') as f: for chunk in remote_file.iter_content(chunk_size=1024): if chunk: f.write(chunk)

2条回答

网友

1楼 · 编辑于 2024-09-30 06:33:08

href['href']将返回/census-recensement/2016/dp-pd/prof/details/download-telecharger/comp/GetFile.cfm?Lang=E&FILETYPE=CSV&GEONO=059您必须在此之前添加https://www12.statcan.gc.ca

csv_url = "https://www12.statcan.gc.ca" + href['href']

要保存文件，请尝试以下操作

# replace csv_url with your scraped link
csv_url = 'https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/prof/details/download-telecharger/comp/GetFile.cfm?Lang=E&FILETYPE=CSV&GEONO=059'

req = requests.get(csv_url)
url_content = req.content
csv_file = open('downloaded.csv', 'wb')
csv_file.write(url_content)
csv_file.close()

网友

2楼 · 编辑于 2024-09-30 06:33:08

要解决您的问题，您可以添加'https://www12.statcan.gc.ca'前缀为href。之后，您将获得文件的有效链接。对于示例中的文件，指向文件的有效链接为'https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/prof/details/download-telecharger/comp/GetFile.cfm?Lang=E&FILETYPE=CSV&；Geno=059'。如果您在网站上有可单击按钮，可以下载一些文件，您可以右键单击->；复制链接位置并将其粘贴到记事本或其他文本编辑器，然后查看此链接以查找href的前缀

相关问题更多 >

编程相关推荐

热门问题

热门文章