Python scrapy-feed-storage-internetarchive包_程序模块 - PyPI

用于存储后端的碎片项目源存档.org

scrapy-feed-storage-internetarchive的Python项目详细描述

饲料储藏室

这是一个用于Scrapy Item Feeds的Storage Backend，当一个刮取作业结束时，它会将提要文件上载到archive.org。在

这是为了方便您在授权分发的Internet存档中存档数据。e、 g.将公共数据归档。在

使用

安装自定义存储后端

我们推荐方案internetarchive

FEED_STORAGES = {
    "internetarchive": "feedstorage_internetarchive.storages.InternetArchiveStorage",
}

配置Internet Archive元数据模板

可以使用设置键FEED_STORAGE_INTERNETARCHIVE指定元数据值，例如

^{pr2}$

配置订阅源的存储

将提要导出器配置与用于安装后端的URI方案一起使用。在

Internet存档源导出器应在用户名和密码位置具有主机名archive.org、Internet Archive S3 API访问密钥和密码。在

只允许一个级别的路径。这将用作文件名，并将被转换为唯一标识符，这意味着它在所有Internet存档中都是唯一的。在该路径中包含scrape作业时间戳有助于确保唯一性。在

可以将额外的参数作为查询字符串参数提供，然后将其模板化为元数据值。在

例如

FEEDS = {
    "internetarchive://YourIAS3AccessKey:YourIAS3APISecretKey@archive.org/south-africa-%(name)s-%(time)s.csv?time=%(time)s&name=%(name)s&filetype=csv": {
        "format": "csv",
    },
    "internetarchive://YourIAS3AccessKey:YourIAS3APISecretKey@archive.org/south-africa-%(name)s-%(time)s.jsonlines?time=%(time)s&name=%(name)s&filetype=jsonlines": {
        "format": "jsonlines",
    },
}

您可能不想将凭据放入项目设置模块中，因为如果将其添加到源代码管理中，则可以很容易地发现它。所以试着把它设置在运行蜘蛛的环境中。在

刮胡布

您可以在scrapinghub中设置FEEDS键，方法是在spider的原始设置中的一行中提供JSON形式的值字典。对于上面的示例，您将在scrapinghub原始设置中添加以下行：

FEEDS = {"internetarchive://YourIAS3AccessKey:YourIAS3APISecretKey@archive.org/south-africa-%(name)s-%(time)s.csv?time=%(time)s&name=%(name)s": {"format": "csv"}, "internetarchive://YourIAS3AccessKey:YourIAS3APISecretKey@archive.org/south-africa-%(name)s-%(time)s.jsonlines?time=%(time)s&name=%(name)s": { "format": "jsonlines" }}

保存后，您应该会在标准设置窗格中看到它被解析为键和值。在

欢迎加入QQ群-->： 979659372

scrapy-feed-storage-internetarchive 0.0.1

scrapy-feed-storage-internetarchive的Python项目详细描述

饲料储藏室

使用

安装自定义存储后端

配置Internet Archive元数据模板

配置订阅源的存储

刮胡布

推荐PyPI第三方库

racedata

djangorestframework-camel-case2

django-media-explorer

badic

pathway-assessor

crypto-currenc

python-emailaho

hq

lingvoreader

seriate

pycipher

poolhub

etherweaver

indicstemmer

dcim-fau

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

scrapy-feed-storage-internetarchive 0.0.1

scrapy-feed-storage-internetarchive的Python项目详细描述

饲料储藏室

使用

安装自定义存储后端

配置Internet Archive元数据模板

配置订阅源的存储

刮胡布

推荐PyPI第三方库

racedata

djangorestframework-camel-case2

django-media-explorer

badic

pathway-assessor

crypto-currenc

python-emailaho

hq

lingvoreader

seriate

pycipher

poolhub

etherweaver

indicstemmer

dcim-fau

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签