如何制作刮痧

from scrapy.spiders import XMLFeedSpider from learning.items import TestItem class MySpider(XMLFeedSpider): name = 'testing' allowed_domains = ['www.cityblueshop.com'] start_urls = ['https://www.cityblueshop.com/sitemap_products_1.xml'] namespaces = [('n', 'http://www.sitemaps.org/schemas/sitemap/0.9')] itertag = 'n:url' iterator = 'xml' def parse_node(self, response, node): item = TestItem() item['url'] = node.xpath('.//n:loc/text()').extract() return item

2条回答

网友

1楼 · 编辑于 2024-10-05 14:26:25

我在本地测试了以下蜘蛛：

from scrapy.spiders import XMLFeedSpider

class MySpider(XMLFeedSpider):
    name = 'testing'
    allowed_domains = ['www.cityblueshop.com']
    start_urls = ['https://www.cityblueshop.com/sitemap_products_1.xml']

    namespaces = [('n', 'http://www.sitemaps.org/schemas/sitemap/0.9')]
    itertag = 'n:url'
    iterator = 'xml'


    def parse_node(self, response, node):
        yield {'url': node.xpath('.//n:loc/text()').get()}

运行不到3秒，包括Scrapy core启动和其他一切。你知道吗

请确保时间没有花在其他地方，例如在learning模块中，从该模块导入item子类。你知道吗

网友

2楼 · 编辑于 2024-10-05 14:26:25

尝试增加并发请求、每个域的并发请求、每个IP的并发请求，例如：https://doc.scrapy.org/en/latest/topics/settings.html#concurrent-requests-per-domain 但请记住，除了高速以外，它还可能导致较低的成功率，如许多429响应、禁令等

相关问题更多 >

编程相关推荐

热门问题

热门文章