抓取时显示错误的碎片爬虫程序

import scrapy from scrapy.http import Request from scrapy.selector import HtmlXPathSelector from scrapy.spider import BaseSpider class CuponationSpider(scrapy.spider): name = "cupo" allowed_domains = ["cuponation.in"] start_urls = ["https://www.cuponation.in/firstcry-coupon#voucher"] def parse(self, response): all_items = [] divs_action = response.xpath('//div[@class="action"]') for div_action in divs_action: item = VoucherItem() span0 = div_action.xpath('./span[@data-voucher-id]')[0] item['voucher_id'] = span0.xpath('./@data-voucher- id').extract()[0] item['code'] = span0.xpath('./span[@class="code- field"]/text()').extract()[0] all_items.append(item) >**Output** ERROR File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open raise URLError(err)URLError: <urlopen error timed out> 2017-07-25 16:36:59 [boto] ERROR: Unable to read instance data, giving up

1条回答

网友

1楼 · 发布于 2024-05-09 06:08:05

Comment: ... tell me the error where i am doing

删除所有import行，仅使用一行：
```
import scrapy
```
你的类继承应该是：
```
class CuponationSpider(scrapy.Spider):
```

您已更改name和starturl，请使用：

name = "cuponation"
allowed_domains = ['cuponation.in']
start_urls = ['https://www.cuponation.in/firstcry-coupon']

使用python2.7 抱歉，无法使用2.7运行Scrapy。这就是区别。
错误：无法读取实例数据，给出，表明没有从给定的URL接收任何数据。也许你被列入黑名单了。在

Comment: URL is cuponation.in/firstcry-coupon#voucher

这是一个相同的页面，无需重新加载。
所有这些都可以简化为：

all_items = []

def parse(self, response):
    # Get all DIV with class="action"
    divs_action = response.xpath('//div[@class="action"]')

    for div_action in divs_action:
        item = VoucherItem()

        # Get SPAN from DIV with Attribute data-voucher-id
        span0 = div_action.xpath('./span[@data-voucher-id]')[0]

        # Copy Attribute voucher_id
        item['voucher_id'] = span0.xpath('./@data-voucher-id').extract()[0]

        # Find SPAN class="code-field" inside span0 and copy Text
        item['code'] = span0.xpath('./span[@class="code-field"]/text()').extract()[0]

        all_items.append(item)

Output:

#CouponSpider.start_requests:https://www.cuponation.in/firstcry-coupon
#CouponSpider.parse()
#CouponSpider.divs_action:List[13] of <Element div at 0xf6b1c20c>
{'voucher_id': '868600', 'code': '*******'}
{'voucher_id': '31793', 'code': '*******'}
{'voucher_id': '832408', 'code': '*******'}
{'voucher_id': '819903', 'code': '*******'}
{'voucher_id': '808774', 'code': '*******'}
{'voucher_id': '32274', 'code': '*******'}
{'voucher_id': '32102', 'code': '*******'}
{'voucher_id': '844247', 'code': '*******'}
{'voucher_id': '843513', 'code': '*******'}
{'voucher_id': '848151', 'code': '*******'}
{'voucher_id': '845248', 'code': '*******'}
{'voucher_id': '869101', 'code': '*******'}
{'voucher_id': '869328', 'code': '*******'}

相关问题更多 >

编程相关推荐

热门问题

热门文章