嗨,我是新来刮胡子的,我想刮ASP.net现场。我已经确定了表单的参数,这些参数在表单发布时被调用,并在我的代码中使用了它们。然而,即使从第一个页面抓取数据,在这之后,即使爬行器指示其他页面已成功爬网,数据也不会被刮取。一直想弄清楚它为什么不起作用clean_parsed_string'和“get_parsed_string”是我自己用来获取字符串元素的函数,并已在其他网站上测试过。在
def parse(self, response):
sel = Selector(response)
snodes = sel.xpath('//div[@id="hotel_result_hotel_item"]')
for snode in snodes:
hotel_item = Hotel_Items()
hotel_item['name'] = clean_parsed_string(get_parsed_string(snode_restaurant, 'div[@class=""]/table[@class="widthfull"]//a[@class="hot_name"]/text()'))
hotel_item['address'] = clean_parsed_string(get_parsed_string(snode_restaurant, 'div[@class=""]/table[@class="widthfull"]//span[@class="fontsmalli"]/text()'))
hotel_item['stars'] = clean_parsed_string(get_parsed_string(snode_restaurant, 'div[@class=""]/table[@class="widthfull"]//div[@class="mbluebold col_hotelinfo_name"]/input/@class'))
hotel_item['room1'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[1]/td[1]/p[@class="roomtype"]/span/text()'))
hotel_item['room1_price_USD'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[1]/td[5]/p[@class="ratepernight"]/span/text()'))
hotel_item['room2'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[2]/td[1]/p[@class="roomtype"]/span/text()'))
hotel_item['room2_price_USD'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[2]/td[5]/p[@class="ratepernight"]/span/text()'))
hotel_item['room3'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[3]/td[1]/p[@class="roomtype"]/span/text()'))
hotel_item['room3_price_USD'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[3]/td[5]/p[@class="ratepernight"]/span/text()'))
hotel_item['room4'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[4]/td[1]/p[@class="roomtype"]/span/text()'))
hotel_item['room4_price_USD'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[4]/td[5]/p[@class="ratepernight"]/span/text()'))
yield hotel_item
viewstate = sel.xpath('//input[@name="__VIEWSTATE"]/@value').extract()[0]
yield FormRequest.from_response(response,formdata={'ctl00$scriptmanager1':'ctl00$ContentMain$upResultFooter|ctl00$ContentMain$lbtnFooterNext',
'ctl00_scriptmanager1_HiddenField':'',
'__EVENTTARGET':'ctl00$ContentMain$lbtnFooterNext',
'__EVENTARGUMENT':'',
'__LASTFOCUS':'',
'__VIEWSTATE': viewstate,
'__SCROLLPOSITIONX':'0',
'__SCROLLPOSITIONY':'0',
'ctl00$Googlesearch$txtSearch':'',
'ctl00$ddlCurrency$hidCurrencyChange':'USD',
'ctl00$ContentMain$hdfMinPrice':'',
'ctl00$ContentMain$hdfMaxPrice':'',
'ctl00$ContentMain$ddlSort':'1',
'ctl00$ContentMain$hidMenu':'0',
'ctl00$ContentMain$hidSubMenu':'',
'ctl00$ContentMain$DestinationSearchBox1$arrivaldate':'06/23/2014',
'ctl00$ContentMain$DestinationSearchBox1$departdate':'06/25/2014',
'ctl00$ContentMain$DestinationSearchBox1$controlmode':'1',
'ctl00$ContentMain$DestinationSearchBox1$jsRooms':'0',
'ctl00$ContentMain$DestinationSearchBox1$jsAdults':'0',
'ctl00$ContentMain$DestinationSearchBox1$jsChildren':'0',
'ctl00$ContentMain$DestinationSearchBox1$SearchHotel':'no',
'ctl00$ContentMain$DestinationSearchBox1$ErrorCharLengthMessage':'Please enter at least the first two letters of the name you are looking for.',
'ctl00$ContentMain$DestinationSearchBox1$TextError':'Please enter the name of a Country, City, Airport, Area, Landmark or Hotel to proceed.',
'ctl00$ContentMain$DestinationSearchBox1$TextSearch1$tmptextDefault':'Country, City, Airport, Area, Landmark',
'ctl00$ContentMain$DestinationSearchBox1$TextSearch1$txtSearch':'Colombo',
'ctl00$ContentMain$DestinationSearchBox1$ddlDistance':'1',
'ddlCheckInDay':'23',
'ddlCheckInMonthYear':'6,2014',
'datepickerarrival':'',
'ddlCheckOutDay':'25',
'ddlCheckOutMonthYear':'6,2014',
'ctl00$ContentMain$DestinationSearchBox1$ddlNights':'2',
'datepickerdepart':'',
'ctl00$ContentMain$DestinationSearchBox1$ddlRoom':'1',
'ctl00$ContentMain$DestinationSearchBox1$ddlAdult':'2',
'ctl00$ContentMain$DestinationSearchBox1$ddlChildren':'0',
'ctl00$ContentMain$txtHotelName':'',
'ctl00$ContentMain$hidHotelList2603':'',
'ctl00$ContentMain$HotelFilterStarRating$HiddenFilterStatus':'',
'ctl00$ContentMain$HotelFilterFacilities$HiddenFilterStatus':'',
'ctl00$ContentMain$HotelFilterAccommodationType$HiddenFilterStatus':'',
'ctl00$ContentMain$HotelFilterArea$HiddenFilterStatus':'',
'ctl00$ContentMain$HotelFilterChainAndBrand$HiddenFilterStatus':'',
#'__ASYNCPOST':'true'
},
callback=self.parse,clickdata=None)
站点可能返回
200 OK
状态,即使您的帖子标题是错误的。尝试使用scrapy shell
并提交一个FormRequest
,其中包含您制作的formdata,以查看站点返回的内容。在我建议使用类似的方法,以避免键入每个标题,并避免可能出现的错误:
相关问题 更多 >
编程相关推荐