抓取时显示错误的碎片爬虫程序

2024-05-09 06:08:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图爬网的couponation网站优惠券的,但当我是 尝试运行爬虫它的显示错误。请救命啊。 谢谢。在

import scrapy
from scrapy.http import Request
from scrapy.selector import HtmlXPathSelector
from scrapy.spider import BaseSpider
class CuponationSpider(scrapy.spider):
   name = "cupo"
   allowed_domains = ["cuponation.in"]
   start_urls = ["https://www.cuponation.in/firstcry-coupon#voucher"]
   def parse(self, response):
      all_items = []
      divs_action = response.xpath('//div[@class="action"]')
      for div_action in divs_action:
         item = VoucherItem()
         span0 = div_action.xpath('./span[@data-voucher-id]')[0]
         item['voucher_id'] = span0.xpath('./@data-voucher-
                  id').extract()[0]
         item['code'] = span0.xpath('./span[@class="code-
               field"]/text()').extract()[0]
         all_items.append(item)





   >**Output** ERROR  
File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open
raise URLError(err)URLError: <urlopen error timed out>
2017-07-25 16:36:59 [boto] ERROR: Unable to read instance data, giving 
 up

Tags: infromimportdividdataactionitem
1条回答
网友
1楼 · 发布于 2024-05-09 06:08:05

Comment: ... tell me the error where i am doing

  1. 删除所有import行,仅使用一行:

    import scrapy
    
  2. 你的类继承应该是:

    class CuponationSpider(scrapy.Spider):
    
  3. 您已更改namestarturl,请使用:

    name = "cuponation"
    allowed_domains = ['cuponation.in']
    start_urls = ['https://www.cuponation.in/firstcry-coupon']
    
  4. 使用python2.7 抱歉,无法使用2.7运行Scrapy。这就是区别。
    错误:无法读取实例数据,给出,表明没有从给定的URL接收任何数据。也许你被列入黑名单了。在

Comment: URL is cuponation.in/firstcry-coupon#voucher

这是一个相同的页面,无需重新加载。
所有这些都可以简化为:

all_items = []

def parse(self, response):
    # Get all DIV with class="action"
    divs_action = response.xpath('//div[@class="action"]')

    for div_action in divs_action:
        item = VoucherItem()

        # Get SPAN from DIV with Attribute data-voucher-id
        span0 = div_action.xpath('./span[@data-voucher-id]')[0]

        # Copy Attribute voucher_id
        item['voucher_id'] = span0.xpath('./@data-voucher-id').extract()[0]

        # Find SPAN class="code-field" inside span0 and copy Text
        item['code'] = span0.xpath('./span[@class="code-field"]/text()').extract()[0]

        all_items.append(item)

Output:

#CouponSpider.start_requests:https://www.cuponation.in/firstcry-coupon
#CouponSpider.parse()
#CouponSpider.divs_action:List[13] of <Element div at 0xf6b1c20c>
{'voucher_id': '868600', 'code': '*******'}
{'voucher_id': '31793', 'code': '*******'}
{'voucher_id': '832408', 'code': '*******'}
{'voucher_id': '819903', 'code': '*******'}
{'voucher_id': '808774', 'code': '*******'}
{'voucher_id': '32274', 'code': '*******'}
{'voucher_id': '32102', 'code': '*******'}
{'voucher_id': '844247', 'code': '*******'}
{'voucher_id': '843513', 'code': '*******'}
{'voucher_id': '848151', 'code': '*******'}
{'voucher_id': '845248', 'code': '*******'}
{'voucher_id': '869101', 'code': '*******'}
{'voucher_id': '869328', 'code': '*******'}            

相关问题 更多 >

    热门问题