Python scrapy-redis包_程序模块 - PyPI

基于redis的scrapy组件。

scrapy-redis的Python项目详细描述

垃圾redis

https://img.shields.io/pypi/v/scrapy-redis.svg

https://img.shields.io/pypi/pyversions/scrapy-redis.svg

https://img.shields.io/travis/rolando/scrapy-redis.svg

用于废料的基于redis的组件。

自由软件：麻省理工学院许可证
文档：https://scrapy-redis.readthedocs.org" rel="nofollow">https://scrapy redis.readthedocs.org
python版本：2.7、3.4+

功能

分布式爬网/抓取
< Buff行情>
可以启动共享单个redis队列的多个spider实例。最适合广泛的多域爬网。
分布式后处理
< Buff行情>
报废项目将被推送到redis队列中，这意味着您可以从许多按需共享项目队列的后处理进程。
破旧的即插即用部件
< Buff行情>
调度程序+复制筛选器、项目管道、基本蜘蛛。

要求

python 2.7、3.4或3.5
redis=2.8
scrapy >；=1.0
redis py >；=2.10

用法

在项目中使用以下设置：

# Enables scheduling storing requests queue in redis.SCHEDULER="scrapy_redis.scheduler.Scheduler"# Ensure all spiders share same duplicates filter through redis.DUPEFILTER_CLASS="scrapy_redis.dupefilter.RFPDupeFilter"# Default requests serializer is pickle, but it can be changed to any module# with loads and dumps functions. Note that pickle is not compatible between# python versions.# Caveat: In python 3.x, the serializer must return strings keys and support# bytes as values. Because of this reason the json or msgpack module will not# work by default. In python 2.x there is no such issue and you can use# 'json' or 'msgpack' as serializers.#SCHEDULER_SERIALIZER = "scrapy_redis.picklecompat"# Don't cleanup redis queues, allows to pause/resume crawls.#SCHEDULER_PERSIST = True# Schedule requests using a priority queue. (default)#SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.PriorityQueue'# Alternative queues.#SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.FifoQueue'#SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.LifoQueue'# Max idle time to prevent the spider from being closed when distributed crawling.# This only works if queue class is SpiderQueue or SpiderStack,# and may also block the same time when your spider start at the first time (because the queue is empty).#SCHEDULER_IDLE_BEFORE_CLOSE = 10# Store scraped item in redis for post-processing.ITEM_PIPELINES={'scrapy_redis.pipelines.RedisPipeline':300}# The item pipeline serializes and stores the items in this redis key.#REDIS_ITEMS_KEY = '%(spider)s:items'# The items serializer is by default ScrapyJSONEncoder. You can use any# importable path to a callable object.#REDIS_ITEMS_SERIALIZER = 'json.dumps'# Specify the host and port to use when connecting to Redis (optional).#REDIS_HOST = 'localhost'#REDIS_PORT = 6379# Specify the full Redis URL for connecting (optional).# If set, this takes precedence over the REDIS_HOST and REDIS_PORT settings.#REDIS_URL = 'redis://user:pass@hostname:9001'# Custom redis client parameters (i.e.: socket timeout, etc.)#REDIS_PARAMS  = {}# Use custom redis client class.#REDIS_PARAMS['redis_cls'] = 'myproject.RedisClient'# If True, it uses redis' ``spop`` operation. This could be useful if you# want to avoid duplicates in your start urls list. In this cases, urls must# be added via ``sadd`` command or you will get a type error from redis.#REDIS_START_URLS_AS_SET = False# Default start urls key for RedisSpider and RedisCrawlSpider.#REDIS_START_URLS_KEY = '%(name)s:start_urls'# Use other encoding than utf-8 for redis.#REDIS_ENCODING = 'latin1'

< div > <注< > >

版本0.3将请求序列化从 marshal 更改为 cpickle ，因此，使用0.2版的持久化请求将无法在0.3上工作。

欢迎加入QQ群-->： 979659372

scrapy-redis 0.6.8

scrapy-redis的Python项目详细描述

垃圾redis

功能

要求

用法

推荐PyPI第三方库

kak-spell

jrpytests

pyknos

stock-learning-rabbitmq

STPN

pyodc

pact-state-provider

analytics-toolbox

odoo13-addon-stock-picking-product-barcode-report

RM-Tools

locative-ml-helpers

urmarketscraper

pylwdrone

iscard

integrals

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

scrapy-redis 0.6.8

scrapy-redis的Python项目详细描述

垃圾redis

功能

要求

用法

推荐PyPI第三方库

kak-spell

jrpytests

pyknos

stock-learning-rabbitmq

STPN

pyodc

pact-state-provider

analytics-toolbox

odoo13-addon-stock-picking-product-barcode-report

RM-Tools

locative-ml-helpers

urmarketscraper

pylwdrone

iscard

integrals

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签