回调函数在scray中不能正常工作

2024-10-01 09:18:52 发布

您现在位置:Python中文网/ 问答频道 /正文

嗨,我是新来刮胡子的,我想刮ASP.net现场。我已经确定了表单的参数,这些参数在表单发布时被调用,并在我的代码中使用了它们。然而,即使从第一个页面抓取数据,在这之后,即使爬行器指示其他页面已成功爬网,数据也不会被刮取。一直想弄清楚它为什么不起作用clean_parsed_string'和“get_parsed_string”是我自己用来获取字符串元素的函数,并已在其他网站上测试过。在

def parse(self, response):
    sel = Selector(response)
    snodes = sel.xpath('//div[@id="hotel_result_hotel_item"]')

    for snode in snodes:
        hotel_item = Hotel_Items()
        hotel_item['name'] = clean_parsed_string(get_parsed_string(snode_restaurant, 'div[@class=""]/table[@class="widthfull"]//a[@class="hot_name"]/text()'))
        hotel_item['address'] = clean_parsed_string(get_parsed_string(snode_restaurant, 'div[@class=""]/table[@class="widthfull"]//span[@class="fontsmalli"]/text()'))
        hotel_item['stars'] = clean_parsed_string(get_parsed_string(snode_restaurant, 'div[@class=""]/table[@class="widthfull"]//div[@class="mbluebold col_hotelinfo_name"]/input/@class'))
        hotel_item['room1'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[1]/td[1]/p[@class="roomtype"]/span/text()'))
        hotel_item['room1_price_USD'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[1]/td[5]/p[@class="ratepernight"]/span/text()'))
        hotel_item['room2'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[2]/td[1]/p[@class="roomtype"]/span/text()'))
        hotel_item['room2_price_USD'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[2]/td[5]/p[@class="ratepernight"]/span/text()'))
        hotel_item['room3'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[3]/td[1]/p[@class="roomtype"]/span/text()'))
        hotel_item['room3_price_USD'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[3]/td[5]/p[@class="ratepernight"]/span/text()'))
        hotel_item['room4'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[4]/td[1]/p[@class="roomtype"]/span/text()'))
        hotel_item['room4_price_USD'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[4]/td[5]/p[@class="ratepernight"]/span/text()'))
        yield hotel_item


    viewstate = sel.xpath('//input[@name="__VIEWSTATE"]/@value').extract()[0]
    yield FormRequest.from_response(response,formdata={'ctl00$scriptmanager1':'ctl00$ContentMain$upResultFooter|ctl00$ContentMain$lbtnFooterNext',
                'ctl00_scriptmanager1_HiddenField':'',
                '__EVENTTARGET':'ctl00$ContentMain$lbtnFooterNext',
                '__EVENTARGUMENT':'',
                '__LASTFOCUS':'',
                '__VIEWSTATE': viewstate,
                '__SCROLLPOSITIONX':'0',
                '__SCROLLPOSITIONY':'0',
                'ctl00$Googlesearch$txtSearch':'',
                'ctl00$ddlCurrency$hidCurrencyChange':'USD',
                'ctl00$ContentMain$hdfMinPrice':'',
                'ctl00$ContentMain$hdfMaxPrice':'',
                'ctl00$ContentMain$ddlSort':'1',    
                'ctl00$ContentMain$hidMenu':'0',
                'ctl00$ContentMain$hidSubMenu':'',
                'ctl00$ContentMain$DestinationSearchBox1$arrivaldate':'06/23/2014',
                'ctl00$ContentMain$DestinationSearchBox1$departdate':'06/25/2014',
                'ctl00$ContentMain$DestinationSearchBox1$controlmode':'1',
                'ctl00$ContentMain$DestinationSearchBox1$jsRooms':'0',  
                'ctl00$ContentMain$DestinationSearchBox1$jsAdults':'0',
                'ctl00$ContentMain$DestinationSearchBox1$jsChildren':'0',
                'ctl00$ContentMain$DestinationSearchBox1$SearchHotel':'no',
                'ctl00$ContentMain$DestinationSearchBox1$ErrorCharLengthMessage':'Please enter at least the first two letters of the name you are looking for.',
                'ctl00$ContentMain$DestinationSearchBox1$TextError':'Please enter the name of a Country, City, Airport, Area, Landmark or Hotel to proceed.',
                'ctl00$ContentMain$DestinationSearchBox1$TextSearch1$tmptextDefault':'Country, City, Airport, Area, Landmark',
                'ctl00$ContentMain$DestinationSearchBox1$TextSearch1$txtSearch':'Colombo',
                'ctl00$ContentMain$DestinationSearchBox1$ddlDistance':'1',
                'ddlCheckInDay':'23',
                'ddlCheckInMonthYear':'6,2014',
                'datepickerarrival':'',
                'ddlCheckOutDay':'25',
                'ddlCheckOutMonthYear':'6,2014',
                'ctl00$ContentMain$DestinationSearchBox1$ddlNights':'2',
                'datepickerdepart':'',
                'ctl00$ContentMain$DestinationSearchBox1$ddlRoom':'1',
                'ctl00$ContentMain$DestinationSearchBox1$ddlAdult':'2',
                'ctl00$ContentMain$DestinationSearchBox1$ddlChildren':'0',
                'ctl00$ContentMain$txtHotelName':'',
                'ctl00$ContentMain$hidHotelList2603':'',
                'ctl00$ContentMain$HotelFilterStarRating$HiddenFilterStatus':'',
                'ctl00$ContentMain$HotelFilterFacilities$HiddenFilterStatus':'',
                'ctl00$ContentMain$HotelFilterAccommodationType$HiddenFilterStatus':'',
                'ctl00$ContentMain$HotelFilterArea$HiddenFilterStatus':'',
                'ctl00$ContentMain$HotelFilterChainAndBrand$HiddenFilterStatus':'',
                #'__ASYNCPOST':'true'
                },
            callback=self.parse,clickdata=None)

Tags: textdivcleangetstringtableitemparsed
1条回答
网友
1楼 · 发布于 2024-10-01 09:18:52

站点可能返回200 OK状态,即使您的帖子标题是错误的。尝试使用scrapy shell并提交一个FormRequest,其中包含您制作的formdata,以查看站点返回的内容。在

我建议使用类似的方法,以避免键入每个标题,并避免可能出现的错误:

formdata = {}

for hid in sel.xpath('//input[@type="hidden" and @value and @name]'):
    formdata[hid.xpath('@name').extract()[0]] = hid.xpath('@value').extract()[0]

相关问题 更多 >