Python，Scrapy，Pipeline:函数“process_item”未被调用

import scrapy from scrapy.spider import Spider from scrapy.selector import Selector from scrapy.http import Request from activityadvisor.items import ComoShamLocation from activityadvisor.items import ComoShamActivity from activityadvisor.items import ComoShamRates import re class ComoSham(Spider): name = "comosham" allowed_domains = ["www.comoshambhala.com"] start_urls = [ "http://www.comoshambhala.com/singapore/classes/schedules", "http://www.comoshambhala.com/singapore/about/location-contact", "http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes", "http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes/rates-private-classes" ] def parse(self, response): category = (response.url)[39:44] print 'in parse' if category == 'class': pass """self.gen_req_class(response)""" elif category == 'about': print 'about to call parse_location' self.parse_location(response) elif category == 'rates': pass """self.parse_rates(response)""" else: print 'Cant find appropriate category! check check check!! Am raising Level 5 ALARM - You are a MORON :D' def parse_location(self, response): print 'in parse_location' item = ComoShamLocation() item['category'] = 'location' loc = Selector(response).xpath('((//div[@id = "node-2266"]/div/div/div)[1]/div/div/p//text())').extract() item['address'] = loc[2]+loc[3]+loc[4]+(loc[5])[1:11] item['pin'] = (loc[5])[11:18] item['phone'] = (loc[9])[6:20] item['fax'] = (loc[10])[6:20] item['email'] = loc[12] print item['address'],item['pin'],item['phone'],item['fax'],item['email'] return item

class ComoShamPipeline(object): def __init__(self): self.locationdump = csv.writer(open('./scraped data/ComoSham/ComoshamLocation.csv','wb')) self.locationdump.writerow(['Address','Pin','Phone','Fax','Email']) def process_item(self,item,spider): print 'processing item now' if item['category'] == 'location': print item['address'],item['pin'],item['phone'],item['fax'],item['email'] self.locationdump.writerow([item['address'],item['pin'],item['phone'],item['fax'],item['email']]) else: pass

3条回答

网友

1楼 · 编辑于 2024-10-01 11:33:50

加上上面的答案，
1请记住将以下行添加到py设置! ITEM_PIPELINES = {'[YOUR_PROJECT_NAME].pipelines.[YOUR_PIPELINE_CLASS]': 300} 2当你的蜘蛛跑的时候交出物品！ yield my_item

网友

2楼 · 编辑于 2024-10-01 11:33:50

在中使用ITEM_PIPELINESpy设置公司名称：

ITEM_PIPELINES = ['project_name.pipelines.pipeline_class']

网友

3楼 · 编辑于 2024-10-01 11:33:50

你的问题是你从来没有真正放弃过这个项目。parse_location返回要分析的项，但parse从不生成该项。在

解决方案是替换：

self.parse_location(response)

与

^{pr2}$

更具体地说，如果没有生成项，则永远不会调用process_item。在

相关问题更多 >

编程相关推荐

热门问题

热门文章