访问常用爬网数据的接口
pycommoncrawl的Python项目详细描述
爬虫
Common Crawl的python接口。在
安装
pip3安装pycommoncrawl
使用
frompycommoncrawl.common_crawl_data_accessorimportCommonCrawlDataAccessorcommon_crawl_data_accessor=CommonCrawlDataAccessor()# Iterate by lineforlineincommon_crawl_data_accessor.get_raw_resource_data("WAT"):print(line)# Iterate by WARC blocforwarcincommon_crawl_data_accessor.get_raw_resource_data_per_warc("WAT"):print(warc["Content-Length"])
- 项目
标签: