在动态页面中使用scrapy和selenium

url = 'https://greenbuildexpo.com/Attendee/Expohall/Exhibitors' driver_1 = webdriver.Firefox() driver_1.get(url) content = driver_1.page_source response = TextResponse(url='',body=content,encoding='utf-8') print len(set(response.xpath('//*[contains(@href,"Attendee/")]//@href').extract()))

File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 192, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: {"method":"xpath","selector":"//*[@id=\"pagingNormalView\"]/ul/li[15]"} Stacktrace:

1条回答

网友

1楼 · 发布于 2024-09-28 01:33:29

您不需要selenium为此，有一个XHR请求让所有参展商，模拟它，从Scrapy Shell演示：

$ scrapy shell https://greenbuildexpo.com/Attendee/Expohall/Exhibitors
In [1]: fetch("https://greenbuildexpo.com/Attendee/ExpoHall/GetAllExhibitors")
2016-10-13 12:45:46 [scrapy] DEBUG: Crawled (200) <GET https://greenbuildexpo.com/Attendee/ExpoHall/GetAllExhibitors> (referer: None)

In [2]: import json

In [3]: data = json.loads(response.body)

In [4]: len(data["Data"])
Out[4]: 541

# printing booth number for demonstration purposes
In [5]: for item in data["Data"]:
   ...:     print(item["BoothNumber"])
   ...:  
2309
2507
...
1243
2203
943

相关问题更多 >

编程相关推荐

热门问题

热门文章