Scrappy规则未获取所有指定链接 - 问答 - Python中文网

Scrappy规则未获取所有指定链接

2024-10-03 06:32:34 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我想以以下格式对所有链接进行爬网：

http://example.com/index.php/comments/XXXXX
http://example.com/XXX1/index.php/comments/XXXXX
http://example.com/XXX2/index.php/comments/XXXX
http://example.com/XXX3/index.php/comments/XXXX

我定义了以下规则：

start_urls = ['http://example.com/']

rules = [Rule(SgmlLinkExtractor(allow=[r'\w+/index.php/comments/\w+']), callback='parse_blogpost', follow=True)]

但爬虫似乎只访问了这样的链接（http://example.com/index.php/comments/XXXXX），而没有访问这样的链接（http://example.com/XXX1/index.php/comments/XXXXX）。你知道吗

任何帮助都将不胜感激！你知道吗

Tags： com http index 定义链接规则 example 格式

0条回答

目前没有回答

相关问题更多 >

编程相关推荐

热门问题

热门文章