硒无头与无头

driver = getHeadlessDriver() feedbacks = driver.find_elements_by_xpath( "//div[contains(@class, 'LiveFeedbackSectionViewController__LiveFeedbackStatusItem-sc-1ahetk9-4 cUJPkM')]") for feedback in feedbacks: print(feedback.text)

3条回答

网友

1楼 · 编辑于 2024-09-27 07:35:43

我想我找到了这个问题的可能答案

在Selenium中使用无头浏览器时，它的运行速度比使用无头浏览器时快。在这种情况下，python程序可能会在DOM完全加载之前执行

换句话说，尝试访问web元素的函数可能返回None，因为在调用函数之前没有加载元素

为了解决这个问题，我们可以利用Selenium中包含的implicitly_wait函数。比如说,

driver = webdriver.Chrome()
driver.implicitly_wait(3) #units in seconds

将告诉驱动程序等待传递给implicitly_wait函数的指定时间（以秒为单位），以便加载DOM

我已经用这种方法在headless模式下测试了我的函数，现在它似乎正在工作。但如果有其他解决方案，请随时发表评论

网友

2楼 · 编辑于 2024-09-27 07:35:43

如果您试图抓取的网站具有javascript呈现的动态元素，则需要Xvfb

sudo apt-get install -y xvfb

"Xvfb or X virtual framebuffer is a display server implementing theX11 display server protocol. In contrast to other display servers,Xvfb performs all graphical operations in virtual memory withoutshowing any screen output."

在python中，Xvfb有两个包装器

1-xvfbwrapper

pip install xvfbwrapper

然后在python文件中添加：

from xvfbwrapper import Xvfb

display = Xvfb()
display.start()

2-pyvirtualdisplay

pip install PyVirtualDisplay

然后在代码中：

from pyvirtualdisplay import Display

display = Display(visible=0, size=(1024, 768))
display.start()

网友

3楼 · 编辑于 2024-09-27 07:35:43

我通常可以用time.sleep(10)绕过这个问题，但是，我有一个特定的网站，我不能用time.sleep(10)或driver.implicitly_wait(10)来处理

我认为该网站有一个检查浏览器用户代理的系统

为了尝试绕过这个问题，我将用户代理添加到headless窗口中，它成功了

browser_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36 Edg/95.0.1020.30'

options_edge.add_argument(f'user-agent={self.user_agent}')

您可以从以下网站获取您的用户代理：https://whatmyuseragent.com/（非附属）

相关问题更多 >

编程相关推荐

热门问题

热门文章