如何使用Selenium和Python从这些JavaScript表中提取数据?

2024-04-27 09:40:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我对Python、JavaScript和Web抓取非常陌生。我正在尝试编写代码,将这样的表中的所有数据写入csv文件。网页为“https://www.mcmaster.com/cam-lock-fittings/material~aluminal/”

enter image description here

我开始尝试在html中查找数据,但后来意识到该网站使用JavaScript。然后我尝试使用selenium,但在JavaScript代码中找不到任何包含这些表中显示的实际数据的地方。我写这段代码是为了看看我是否能在任何地方找到显示数据,但是我找不到它

from urllib.request import urlopen
from bs4 import BeautifulSoup
from selenium import webdriver

url = 'https://www.mcmaster.com/cam-lock-fittings/material~aluminum/'


options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(executable_path='C:/Users/Brian Knoll/Desktop/chromedriver.exe', options=options)

driver.get(url)
html = driver.execute_script("return document.documentElement.outerHTML")
driver.close()

filename = "McMaster Text.txt"
fo = open(filename, "w")
fo.write(html)
fo.close()

我肯定有一个显而易见的答案,我就是想不通。任何帮助都将不胜感激!谢谢大家!


Tags: 数据代码fromhttpsimportcomhtmlwww
1条回答
网友
1楼 · 发布于 2024-04-27 09:40:14

我想你需要等到你要找的那张桌子上了货。
要执行此操作,请添加以下行以等待10秒钟,然后再开始抓取数据

fullLoad = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[contains(@class, 'ItmTblCntnr')]")))

以下是完整的代码:

from urllib.request import urlopen
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

url = 'https://www.mcmaster.com/cam-lock-fittings/material~aluminum/'


options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(executable_path=os.path.abspath("chromedriver"), options=options)

driver.get(url)
fullLoad = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[contains(@class, 'ItmTblCntnr')]")))

html = driver.execute_script("return document.documentElement.outerHTML")
driver.close()

filename = "McMaster Text.txt"
fo = open(filename, "w")
fo.write(html)
fo.close()

相关问题 更多 >