2024-10-01 09:40:32 发布
网友
我需要刮一个使用javascript的页面。 这就是为什么我要用硒。 问题是selenium无法获取所需的数据。在
我想使用htmlXmlSelector尝试获取数据。在
如何将生成的html selenium传递给htmlXmlSelector?在
尝试手动创建Response:
Response
from scrapy.http import TextResponse from scrapy.selector import HtmlXPathSelector body = '''<html></html>''' response = TextResponse(url = '', body = body, encoding = 'utf-8') hxs = HtmlXPathSelector(response) hxs.select("/html")
硒的手动响应:
from scrapy.spider import BaseSpider from scrapy.http import TextResponse from scrapy.selector import HtmlXPathSelector import time from selenium import selenium class DemoSpider(BaseSpider): name="Demo" allowed_domains = ['http://www.example.com'] start_urls = ["http://www.example.com/demo"] def __init__(self): BaseSpider.__init__(self) self.selenium = selenium("127.0.0.1", 4444, "*chrome", self.start_urls[0]) self.selenium.start() def __del__(self): self.selenium.stop() def parse (self, response): sel = self.selenium sel.open(response.url) time.sleep(2.0) # wait for javascript execution #build the response object from Selenium body = sel.get_html_source() sel_response = TextResponse(url=response.url, body=body, encoding = 'utf-8') hxs = HtmlXPathSelector(sel_response) hxs.select("//table").extract()
这是我的解决方案:只需从selenium页面创建htmlXpathSelector,源代码:
hxs = HtmlXPathSelector(text=sel.page_source)
尝试手动创建
Response
:硒的手动响应:
这是我的解决方案:只需从selenium页面创建htmlXpathSelector,源代码:
相关问题 更多 >
编程相关推荐