我要抓取所有存在者的页面:
https://greenbuildexpo.com/Attendee/Expohall/Exhibitors
但是scrapy不加载内容,我现在做的是使用selenium加载页面并使用scrapy搜索链接:
url = 'https://greenbuildexpo.com/Attendee/Expohall/Exhibitors'
driver_1 = webdriver.Firefox()
driver_1.get(url)
content = driver_1.page_source
response = TextResponse(url='',body=content,encoding='utf-8')
print len(set(response.xpath('//*[contains(@href,"Attendee/")]//@href').extract()))
当按下“下一步”按钮时,该网站似乎没有提出任何新的请求,所以我希望一次获得所有链接,但我只得到43个与该代码的链接。他们应该在500左右。在
现在,我尝试通过按“下一步”按钮来爬网页面:
^{pr2}$但我有个错误:
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 192, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: {"method":"xpath","selector":"//*[@id=\"pagingNormalView\"]/ul/li[15]"}
Stacktrace:
您不需要
selenium
为此,有一个XHR请求让所有参展商,模拟它,从Scrapy Shell演示:相关问题 更多 >
编程相关推荐