如何设置刮痧蜘蛛的上界

2024-09-30 08:30:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我想限制在每个页面中找到的项目数量

我发现this documentation似乎符合我的需要:

class scrapy.contracts.default.ReturnsContract

This contract (@returns) sets lower and upper bounds for the items and 
requests returned by the spider. The upper bound is optional:

@returns item(s)|request(s) [min [max]]

但是我不知道如何使用这个类。在我的蜘蛛中,我试图添加

ReturnsContract.__setattr__("max",10)

但它不起作用。我错过什么了吗


Tags: andthe项目default数量documentation页面this
1条回答
网友
1楼 · 发布于 2024-09-30 08:30:57

Spider Contracts用于测试目的,而不是控制数据提取逻辑

Testing spiders can get particularly annoying and while nothing prevents you from writing unit tests the task gets cumbersome quickly. Scrapy offers an integrated way of testing your spiders by the means of contracts.

This allows you to test each callback of your spider by hardcoding a sample url and check various constraints for how the callback processes the response. Each contract is prefixed with an @ and included in the docstring.

出于您的目的,您只需在提取逻辑中设置一个上限,例如:

response.xpath('//my/xpath').extract()[:10]

相关问题 更多 >

    热门问题