下载从网页链接的所有文档。
downlink的Python项目详细描述
下行
A python library and command line tool for scraping (and downloading) links on a web page.
库
- linkscraper.py
- LinkScraper - class for scraping links from a page
- document_linkscraper.py
- DocumentLinkScraper - subclass of LinkScraper
- class for scraping “document links,” which all end in a given file extension, such as “.pdf”
- __init__.py
- imports library classes for cleaner importing
- 主要的
- main()-命令行工具的入口点
命令行工具
Basic usage:
$ downlink “https://www.ct.gov/doh/cwp/view.asp?a=4513&q=530462” output
The above will download all PDF documents to a folder called “output” which must exist and be writable.
To download files of a different extension, use the –ext option.
For more usage details, run downlink –help