废料输出测试框架
scrapy-test的Python项目详细描述
刮擦试验
scrapy测试是用于验证scrapy结果的验证/测试框架。 这个框架能够测试scrapy爬网和stats输出。
请参阅带有完整测试套件的hackernews crawler的example项目。
哲学与建筑
scrapy-test
试图复制scrapy.Item
定义,但它没有定义字段,而是为每个字段定义测试。
测试是可调用的,如果满足某些条件,则返回失败消息。
项目规格示例:
class MyItem(Item):
name = Field()
url = Field()
class TestMyItem(ItemSpec):
item_cls = MyItem
# define tests
name_test = Match('some-regex-pattern')
url_test = lamda v: 'bad url' if 'cat' in v else ''
# define coverage
url_cov = 100 # 100% - every item should have url field
scrapy-test
还支持stats输出验证。当scrapy完成爬行时,它输出各种统计信息,如错误计数等。可以定义StatSpec
来验证这些统计信息:
class MyStats(StatsSpec):
spider_cls = MySpder1, MySpider2
# or multiple spiders
validation = { #stat_name_pattern : tests
'item_scraped_count': MoreThan(1),
'downloader/response_status_count/50\d': LessThan(1),
}
# required stat keys
required = ['stat_pattern.+']
最后^ {< CD1>}通过断言是否有任何消息由STAT IR项目规范(分别退出代码1和0)来确定失败。
用法
设置
test.py
应在spider目录中创建模块。
例如创建test.py
scrapy-test-example/ ├── example │ ├── __init__.py │ └── test.py └── scrapy.cfg
将测试文件配置添加到
scrapy.cfg
:[settings]default=example.settings[test]root=example.test
为项字段验证定义
ItemSpec
:fromscrapytest.testsimportMatch,Equal,Type,MoreThan,Map,Len,RequiredclassTestPost(ItemSpec):# defining item that is being covereditem_cls=PostItem# defining field teststitle_test=Match('.{5,}')points_test=Type(int),MoreThan(0)author_test=Type(str),Match('.{3}')comments_test=Type(list),Required()# also supports methods!defurl_test(self,value:str):ifnotvalue.startswith('http'):returnf'Invalid url: {value}'return''
ItemSpec
类应该包含以_test
结尾的属性。这些属性是可调用的(函数、方法等),如果遇到失败,则返回消息。请参阅上面的url_test
示例。定义
StatSpec
用于爬网统计信息验证:classTestStats(StatsSpec):# stat pattern: test functionsvalidate={# this is default'log_count/ERROR$':LessThan(1),'item_scraped_count':MoreThan(1),'finish_reason':Match('finished'),}# these stats shoudl be requiredrequired=['some_cool_stat']
StatsSpec
应该包含带有pattern: tests
字典的validate
属性。定义
Spider
类:fromproject.spidersimportHackernewsSpiderclassTestHackernewsSpider(HackernewsSpider):test_urls=["https://news.ycombinator.com/item?id=19187417",]defstart_requests(self):forurlinself.test_urls:yieldRequest(url,self.parse_submission)
这个蜘蛛应该扩展您的生产蜘蛛,它只对url进行爬行而不进行发现。或者,您也不能为实时测试扩展任何内容。
运行
$ scrapy-test --help Usage: scrapy-test [OPTIONS][SPIDER_NAME] run scrapy-test tests and output messages and appropriate exit code (1for failed, 0for passed) Options: --cache enable HTTPCACHE_ENABLED setting for this run --help Show this message and exit.
要运行测试,请使用cli命令:
$ scrapy-test <spider_name>
可以跳过spider名称以运行所有spider
通知
scrapy-test
支持测试失败或成功时的通知挂钩:
--notify-on-error TEXT send notification on failure, choice from:
['slack']
--notify-on-all TEXT send notification on failure or success, choice
from: ['slack']
--notify-on-success TEXT send notification on success, choice from:
['slack']
右scrapy-test
提供这些通知:
* Slack - to configure slack notification follow slack [incoming webhooks](https://slack.com/apps/A0F7XDUAZ-incoming-webhooks) app and supply these settings in `scrapy.cfg`:
slack_url = https://hooks.slack.com/services/AAA/BBB/CCC
# where the message goes to
slack_channel = #cats
# bot's name
slack_username = bender
# bot's avatar
slack_icon_emoji = :bender:
# maintainer will be mentioned on error
slack_maintainer = @bernard