Python下载多个文件

import urllib from urllib.request import urlopen, urlretrieve, quote from bs4 import BeautifulSoup url = 'http://www.chessgames.com/perl/chesscollection?cid=1014492' u = urlopen(url) html = u.read().decode('utf-8') soup = BeautifulSoup(html, "html.parser") for link in soup.find_all('a'): urlopen('http://chessgames.com'+link.get('href'))

1条回答

网友

1楼 · 发布于 2024-10-01 02:32:39

你的问题没有简短的答案。我将向您展示一个完整的解决方案并评论此代码。

首先，导入必要的模块：

from bs4 import BeautifulSoup
import requests
import re

接下来，获取索引页并创建BeautifulSoup对象：

req = requests.get("http://www.chessgames.com/perl/chesscollection?cid=1014492")
soup = BeautifulSoup(req.text, "lxml")

我强烈建议使用lxml解析器，而不是普通的html.parser 之后，你应该准备游戏的链接列表：

pages = soup.findAll('a', href=re.compile('.*chessgame\?.*'))

你可以通过搜索包含“棋盘游戏”单词的链接来完成。现在，您应该准备将为您下载文件的函数：

def download_file(url):
    path = url.split('/')[-1].split('?')[0]
    r = requests.get(url, stream=True)
    if r.status_code == 200:
        with open(path, 'wb') as f:
            for chunk in r:
                f.write(chunk)

最后一个魔术是重复所有先前的步骤，为文件下载程序准备链接：

host = 'http://www.chessgames.com'
for page in pages:
    url = host + page.get('href')
    req = requests.get(url)
    soup = BeautifulSoup(req.text, "lxml")
    file_link = soup.find('a',text=re.compile('.*download.*'))
    file_url = host + file_link.get('href')
    download_file(file_url)

（首先搜索描述中包含文本“download”的链接，然后构造完整的url-连接主机名和路径，最后是下载文件）

我希望你可以使用这个代码没有更正！

相关问题更多 >

编程相关推荐

热门问题

热门文章