如何在多个URL上运行爬行器，同时将字符串连接到URL

class ProductsSpider(scrapy.Spider): name = "gaming" def start_requests(self): product = input("Enter the item you are looking for") yield scrapy.Request( url=f'https://www.czone.com.pk/search.aspx?kw={product}', callback=self.parse ) def parse(self, response):

2条回答

网友

1楼 · 编辑于 2024-09-30 10:27:39

检查此代码：

import scrapy


class ProductsSpider(scrapy.Spider):
    name = "gaming"

    def __init__(self, product='', **kwargs):
        self.start_urls = [
            f'https://www.czone.com.pk/search.aspx?kw={product}',
            f'https://pcfanatics.pk/search?type=product&q={product}',
            f'https://gtstore.pk/searchresults.php?inputString={product}',
        ]
        super().__init__(**kwargs)

    def start_requests(self):
        for s_url in self.start_urls:
            yield scrapy.Request(
                url=s_url,
                callback=self.parse,
            )

    def parse(self, response):
        print(self.name)
        ... do parse things ...

在scrapy spider中获取输入的正确方法是在运行时使用-a选项，例如，要运行此spider，您应该使用：

scrapy crawl gaming -a product='foo'

或

scrapy runspider <spider_filename> -a product='foo'

URL错误可能是由于格式错误，使用

            f'https://www.czone.com.pk/search.aspx?kw={product}',
            f'https://pcfanatics.pk/search?type=product&q={product}',
            f'https://gtstore.pk/searchresults.php?inputString={product}',

没有给我任何问题

网友

2楼 · 编辑于 2024-09-30 10:27:39

根据你的问题，解决办法如下：

我的代码：

import scrapy

class ProductsSpider(scrapy.Spider):
    
    name = "games"
    
    product = input("laptop")
    product2 = input("desktop")
    product3 = input("cameras")
    
    def start_requests(self):
        
        urls =[f'https://www.czone.com.pk/search.aspx?kw={self.product}', f'https://www.czone.com.pk/search.aspx?kw={self.product2}', f'https://www.czone.com.pk/search.aspx?kw={self.product3}']
        
        for url in urls:
            
            yield scrapy.Request(
                url =url,
                callback=self.parse
            )

    def parse(self, response):
        pass

与备选方案相同：

代码：

import scrapy
class ProductsSpider(scrapy.Spider):
    name = "games2"
    product = input(["laptop","desktop","cameras"])
    
    def start_requests(self):
        yield scrapy.Request(
            url=f'https://www.czone.com.pk/search.aspx?kw={self.product}',
            callback=self.parse
            )

    def parse(self, response):
        pass

输出：

laptop
desktop
cameras
['laptop', 'desktop', 'cameras']

2021-08-12 16:53:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.czone.com.pk/search.aspx?kw=> (referer: None)
2021-08-12 16:53:39 [scrapy.core.engine] INFO: Closing spider (finished)
2021-08-12 16:53:39 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 312,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 19982,
 'downloader/response_count': 1,
 'downloader/response_status_count/200

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在多个URL上运行爬行器，同时将字符串连接到URL

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >