一个简单、快速、可靠的coursera抓取和下载工具
dl-coursera的Python项目详细描述
一个简单、快速、可靠的Coursera爬行和下载工具
待办事项
- [X]讲座(视频、字幕、幻灯片)
- [X]读数
- []快速
- []Jupyter笔记本
安装
Python 3.x is required. It is recommended to install this tool in a virtual environment
$ pip install dl_coursera
$ dl_coursera --version
如何使用
$ python ..\dl_coursera_run.py --help
usage: dl_coursera_run.py [-h] [--version] [--email EMAIL]
[--password PASSWORD] [--cookies COOKIES] --slug
SLUG [--isSpec] [--n-worker N_WORKER]
[--outdir OUTDIR] --how
{builtin,curl,aria2,aria2-rpc,uget}
[--generate-input-file]
[--aria2-rpc-url ARIA2_RPC_URL]
[--aria2-rpc-secret ARIA2_RPC_SECRET]
A simple, fast, and reliable Coursera crawling & downloading tool
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--email EMAIL
--password PASSWORD
--cookies COOKIES path of the file which contains cookies in the Mozilla
`cookies.txt` file format
--slug SLUG slug of a course or a specializtion (with @--isSpec)
--isSpec indicate that @slug is slug of a specialization
--n-worker N_WORKER the number of threads used to crawl webpages. Default:
4. NOTE: if errors show up during crawling, try
decreasing this value
--outdir OUTDIR the directory to save files to. Default: `.'
--how {builtin,curl,aria2,aria2-rpc,uget}
how to download files. builtin (NOT recommonded): use
the builtin downloader. curl: invoke the `curl' tool
or generate an "input file" for that tool (with
@--generate-input-file). aria2: invoke the `aria2c'
tool or generate an "input file" for that tool (with
@--generate-input-file). aria2-rpc (HIGHLY
recommonded): add downloading tasks to aria2 through
its XML-RPC interface. uget (recommonded): add
downloading tasks to the uGet Download Manager
--generate-input-file
when @--how is curl/aria2, indicate that to generate
an "input file" for that tool, rather than to invoke
it
--aria2-rpc-url ARIA2_RPC_URL
url of the aria2 XML-RPC interface. Default:
`http://localhost:6800/rpc'
--aria2-rpc-secret ARIA2_RPC_SECRET
authorization token of the aria2 XML-RPC interface
If the command succeeds, you shall see `Done :-)'. If some UNEXPECTED errors
occur, try deleting everything generated by this tool in @outdir, and then run
the command again. For more information, visit `https://github.com/feng-
lei/dl_coursera'.
如何认识一门课程/专业
导航到该课程/专业的主页,您可以在地址栏看到它的slug。
如何获取cookies.txt文件
登录到Coursera,然后使用浏览器扩展将cookies导出为cookies.txt。
铬
您可以使用cookies.txt扩展名。
火狐
您可以使用Export Cookies扩展名。
示例
(2019/08/30) Since the login API of Coursera changed, you should not use
--password
anymore, use--cookies
instead. More specifically, use$ dl_coursera --cookies path/of/cookies.txt ......
rather than
$ dl_coursera --email XXXXXX --password XXXXXX ......
使用内置下载程序
$ dl_coursera --email XXXXXX --password XXXXXX --slug parallel-programming-in-java --outdir ppij --how builtin
使用curl
$ dl_coursera --email XXXXXX --password XXXXXX --slug parallel-programming-in-java --outdir ppij --how curl
或
$ dl_coursera --email XXXXXX --password XXXXXX --slug parallel-programming-in-java --outdir ppij --how curl --generate-input-file
$ curl --config ppij/parallel-programming-in-java.download.curl_input_file.txt
使用aria2
$ dl_coursera --email XXXXXX --password XXXXXX --slug parallel-programming-in-java --outdir ppij --how aria2
或
$ dl_coursera --email XXXXXX --password XXXXXX --slug parallel-programming-in-java --outdir ppij --how aria2 --generate-input-file
$ aria2c --input-file ppij/parallel-programming-in-java.download.aria2_input_file.txt
通过aria2的xml-rpc接口向其添加任务
启动aria2并启用其xml-rpc接口:
$ aria2c --enable-rpc
然后在另一个终端中键入以下命令:
$ dl_coursera --email XXXXXX --password XXXXXX --slug parallel-programming-in-java --outdir ppij --how aria2-rpc
note:强烈建议使用类似webui-aria2的aria2图形用户界面
向uget下载管理器添加任务
启动uget:
$ uget # on Windows
$ uget-gtk & # on Linux
然后键入以下命令:
$ dl_coursera --email XXXXXX --password XXXXXX --slug parallel-programming-in-java --outdir ppij --how uget