因为Flipkart.com在第一页只显示15到20个结果,当滚动时它显示更多的结果。Scrapy成功提取了第一页的结果,但没有提取下一页的结果。我试着用硒来做,但没成功。 这是我的代码:
from scrapy.spider import Spider
from scrapy.selector import Selector
from flipkart.items import FlipkartItem
from scrapy.spider import BaseSpider
from selenium import webdriver
class FlipkartSpider(BaseSpider):
name = "flip1"
allowed_domains = ["flipkart.com"]
start_urls = [
"http://www.flipkart.com/beauty-and-personal-care/personal-care-appliances/hair-dryers/pr?sid=t06,79s,mh8&otracker=nmenu_sub_electronics_0_Hair%20Dryers"
]
def __init__(self):
self.driver = webdriver.Firefox()
def parse(self, response):
sel = Selector(response)
self.driver.get(response.url)
while True:
next = self.driver.find_element_by_xpath('//div[@id="show-more-results"]')
try:
sites = sel.select('//div[@class="gd-col gu12 browse-product fk-inf-scroll-item"] | //div[@class="pu-details lastUnit"]')
for site in sites:
item = FlipkartItem()
item['title'] = site.select('div//a[@class="lu-title"]/text() | div[1]/a/text()').extract()
item['price'] = site.select('div//div[@class="pu-price"]/div/text() | div//div[@class="pu-final"]/span/text()').extract()
yield item
next.wait_for_page_to_load("30")
except:
break
self.driver.close()
,我的项目.py是:-
^{pr2}$我得到的输出只有15个项目:在
[{"price": ["Rs. 599"], "title": ["\n Citron Elegant 1400 W HD001 Hair Dryer (Pink)\n "]},
{"price": ["Rs. 799"], "title": ["\n Citron Vogue 1800 W HD002 Hair Dryer (White)\n "]},
{"price": ["Rs. 645"], "title": ["\n Philips HP8100/00 Hair Dryer (Blue)\n "]},
{"price": ["Rs. 944"], "title": ["\n Philips HP8111/00 Hair Dryer\n "]},
{"price": ["Rs. 171"], "title": ["\n Nova Professional With 2 Speed NV-1290 Hair Dryer (Pink...\n "]},
{"price": ["Rs. 175"], "title": ["\n Nova NHD 2840 Hair Dryer\n "]},
{"price": ["Rs. 775"], "title": ["\n Philips HP 8112 Hair Dryer\n "]},
{"price": ["Rs. 1,925"], "title": ["\n Philips HP8643/00 Miss Fresher's Pack Hair Straightener...\n "]},
{"price": ["Rs. 144"], "title": ["\n Nova Foldable N-658 Hair Dryer (White, Pink)\n "]},
{"price": ["Rs. 1,055"], "title": ["\n Philips HP8100/46 Hair Dryer\n "]},
{"price": ["Rs. 849"], "title": ["\n Panasonic EH-ND12-P62B Hair Dryer (Pink)\n "]},
{"price": ["Rs. 760"], "title": ["\n Panasonic EH-ND11 Hair Dryer (White)\n "]},
{"price": ["Rs. 1,049"], "title": ["\n Panasonic EH-ND13-V Hair Dryer (Violet)\n "]},
{"price": ["Rs. 1,554"], "title": ["\n Philips 1600 W HP4940 Hair Dryer (White & Light Pink)\n "]},
{"price": ["Rs. 2,008"], "title": ["\n Philips Kerashine HP8216/00 Hair Dryer\n "]}]
您可以使用Javascript向下滚动页面。在
以下代码将在x&y方向向下滚动1000010000。因为10000是一个很大的数字,所以它会把你带到页面的底部。到达底部后,flipkart将触发AJAX请求以加载更多项。在
我不知道我们如何才能做到这一点,但使用硒是很容易的。在
这是密码
^{pr2}$你必须强制webdriver加载更多的结果。 为了能够与其他结果交互,webdriver需要滚动页面直到元素出现。在
滚动的代码是:
要决定滚动到哪里,可以在页面的下部找到一个元素(例如页脚),然后继续滚动到它。到获取可以使用Webelement属性位置的元素的坐标
^{pr2}$相关问题 更多 >
编程相关推荐