我的spider中有两个for循环,一个用于图像,一个用于房间数据。它们在独立运行时都可以正常工作,但如果将它们都放在我的爬行器中,取决于哪一个先到,它将正确地提供图像URL或房间数据,但不能同时提供两者。我已经试着在收益率方面做了一些改变,并且阅读了关于运行多个spider的文档,但是我只想知道我做错了什么
这是我的代码,我对Scrapy非常陌生,刚刚了解了用于格式化数据的项目加载器,所以我还没有使用过这些
items.py
import scrapy
class ResortItem(scrapy.Item):
# images
images = scrapy.Field()
image_urls = scrapy.Field()
# room details and amenities
room_title = scrapy.Field()
square_feet = scrapy.Field()
kitchen = scrapy.Field()
num_baths = scrapy.Field()
max_guests = scrapy.Field()
beds = scrapy.Field()
washer_dryer = scrapy.Field()
room_amenities = scrapy.Field()
刮刀
import scrapy
from items import ResortItem
class ScraperSpider(scrapy.Spider):
name = 'scraper'
allowed_domains = ['domains']
start_urls = [
'urls'
]
def parse(self, response):
item = ResortItem()
unit_img_path = units_img.xpath(unit_image_selector).getall()
url_list = imgs_path + unit_img_path
image_urls = [
"url" + x for x in url_list]
item['image_urls'] = image_urls
yield item
# gets and sets the room_title to an item
room_title = units.xpath(room_nameSelector).get().strip()
item['room_title'] = room_title
beds = units.xpath(bedSelector).getall()
item['beds'] = beds
num_baths = units.xpath(bathsSelector).get().strip()
item['num_baths'] = num_baths
# gets the square feet and sets it to an item
square_feet = units.xpath(sqftSelector).get().strip()
item['square_feet'] = square_feet
room_amenities = units.xpath(room_amenitiesSelector).getall()
# Pulls Washer/Dryer amenity if available
washer_amenity = 'Washer'
washer_dryer = list(
filter(lambda x: washer_amenity in x, room_amenities))
# Extracts the washer/dryer room_amenities list
# setting room_amenities item
room_amenities = [
x for x in room_amenities if not x.startswith('Washer')]
item['room_amenities'] = room_amenities
# formatting Kitchen data
# setting kitchens item
kitchen = units.xpath(kitchenSelector).get().strip()
item['kitchen'] = kitchen
yield item
移动这个
在第二个循环内
删除第一个循环和不必要的变量
相关问题 更多 >
编程相关推荐