Selenium滚动无限页面

1 周，4 日 Questions & Answers 2404

我正试图抓取这一（无限）页面（www.mydealz.de），但我无法让我的webdriver向下滚动页面。我使用Python（3.5）、Selenium（3.6）和PhantomJS。我已经尝试了几种方法，但是webdriver只是不滚动——它只是给了我第一页

第一种方法（通常的滚动方法）：

last_height = driver.execute_script("return document.body.scrollHeight")
while True:
  driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
  time.sleep(1)
  new_height = driver.execute_script("return document.body.scrollHeight")
  if new_height == last_height:
       break
  last_height = new_height

第二种方法（只需按下向下键几次并释放它，也尝试在两次按下之间等待）：

ActionChains(driver).key_down(Keys.ARROW_DOWN).perform()
ActionChains(driver).key_up(Keys.ARROW_DOWN).perform()

第三种方法（找到“滚动列表”中的最后一个元素并滚动到其视图以强制滚动）：

posts = driver.find_elements_by_css_selector("div.threadGrid")
driver.execute_script("arguments[0].scrollIntoView();", posts[-1])

到目前为止没有任何效果，有人知道是否有其他方法或我在哪里犯了错误吗

from selenium import webdriver driver = webdriver.PhantomJS(executable_path=r'C:\\Utility\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe') driver.set_window_size(1400,1000) driver.get("https://www.mydealz.de") while ("3" not in driver.current_url) : driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") print(driver.current_url) driver.quit()

import json, requests url = 'http://www.mydealz.de/' headers = {'x-requested-with': 'XMLHttpRequest'} for page in range(10): params = dict(page=page, ajax='true') resp = requests.get(url=url, params=params, headers=headers) data = json.loads(resp.text) html = data['data']['content'] # do something with html, maybe parse it with beautifulsoup

共 (3) 个答案

# 1 楼答案
我可以在上面提到的网站上看到1853页。因此，您可以迭代循环，直到到达最后一页。睡眠时间必须高于平均水平才能加载每个页面，最少尝试3次，值越大，不加载数据的机会越小
```
number_of_scroll = 1857

while number_of_scroll > 0:
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(3)
    number_of_scroll = number_of_scroll-1
```

# 2 楼答案

要滚动浏览网页直到the url is mydealz.de/?page=3，可以使用以下代码块：

控制台输出：

https://www.mydealz.de/?page=3

# 3 楼答案

比使用Selenium/PhantomJS更简单的方法是模仿浏览器的功能。如果你在Chrome的开发者工具中打开“网络”标签，你会看到浏览器会请求https://www.mydealz.de/?page=2&ajax=true以实现无休止的滚动。当我将请求复制为curl时，将其限制到它所导致的最小值

curl 'https://www.mydealz.de/?page=2&ajax=true' -H 'x-requested-with: XMLHttpRequest'

将其转换为python脚本：

除了更简单的代码外，它还将更快

Python中文网

有 Java 编程相关的问题?

使用Python/PhantomJS/Selenium滚动无限页面

共 (3) 个答案

# 1 楼答案

# 2 楼答案

# 3 楼答案