我正试图从一个酒店列表网站this site上搜集细节。 在这里,当我们为下一页单击next按钮时,url保持不变,当使用inspect元素查看时,站点正在发送XHR请求。我尝试使用seleniumwebdriver和python,下面是我的代码
from time import sleep
import scrapy
from selenium import webdriver
from scrapy.selector import Selector
from scrapy.http import Request
from selenium.common.exceptions import NoSuchElementException
class DineoutRestaurantSpider(scrapy.Spider):
name = 'dineout_restaurant'
allowed_domains = ['dineout.co.in/bangalore-restaurants?search_str=']
start_urls = ['http://dineout.co.in/bangalore-restaurants?search_str=']
def start_requests(self):
self.driver = webdriver.Chrome('/Users/macbookpro/Downloads/chromedriver')
self.driver.get('https://www.dineout.co.in/bangalore-restaurants?search_str=')'
url = 'https://www.dineout.co.in/bangalore-restaurants?search_str='
**yield Request(url, callback=self.parse)**
self.logger.info('Empty message')
for i in range(1, 4):
try:
next_page = self.driver.find_element_by_xpath('//a[text()="Next "]')
sleep(11)
self.logger.info('Sleeping for 11 seconds.')
next_page.click()
url = 'https://www.dineout.co.in/bangalore-restaurants?search_str='
yield Request(url, callback=self.parse)
except NoSuchElementException:
self.logger.info('No more pages to load.')
self.driver.quit()
break
def parse(self, response):
self.logger.info('Entered parse method')
restaurants = response.xpath('//*[@class="cardBg"]')
for restaurant in restaurants:
name = restaurant.xpath('.//*[@class="titleDiv"]/h4/a/text()').extract_first()
location = restaurant.xpath('.//*[@class="location"]/a/text()').extract()
rating = restaurant.xpath('.//*[@class="rating rating-5"]/a/span/text()').extract_first()
yield{
'Name': name,
'Location': location,
'Rating': rating,
}`
在上面的代码中,yield请求没有转到parse函数?我遗漏了什么吗?我没有得到任何错误,但scrape输出只是第1页,即使页面正在迭代
目前没有回答
相关问题 更多 >
编程相关推荐