抓取多个URL

2024-09-28 05:18:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我对编码还不熟悉,但我写的这段代码可以很好地删除页面,但我想删除多个URL,比如200个,我该怎么做

from selenium import webdriver

chrome_path = r"C:\Users\lenovo\Downloads\chromedriver_win32 (5)\chromedriver.exe"

driver = webdriver.Chrome(chrome_path)

driver.get("https://www.kijijiautos.ca/vip/22442312")

driver.find_element_by_xpath('//div[@class="b1yLWE b3zFtQ"]').text

btn = driver.find_element_by_xpath('//button[@class="g1zAe-"]')

btn.click()

driver.find_elements_by_xpath('//span[@class="A2jAym q2jAym"]').text

driver.find_element_by_xpath('//div[@class="b1yLWE b1zAe-"]').text

print(driver.current_url)

Tags: pathtextdiv编码bydriverelementfind
2条回答

像下面这样

from selenium import webdriver

chrome_path = r"C:\Users\lenovo\Downloads\chromedriver_win32 (5)\chromedriver.exe"

driver = webdriver.Chrome(chrome_path)


def get_scarping(link):
    driver.get(link)
    driver.find_element_by_xpath('//div[@class="b1yLWE b3zFtQ"]').text
    btn = driver.find_element_by_xpath('//button[@class="g1zAe-"]')
    btn.click()
    driver.find_elements_by_xpath('//span[@class="A2jAym q2jAym"]').text
    driver.find_element_by_xpath('//div[@class="b1yLWE b1zAe-"]').text
    print(driver.current_url)
    return driver.current_url 


links = ["https://www.kijijiautos.ca/vip/22442312", "other_urls"]
scrapings = []
for link in links:
    scrapings.append(get_scarping(link))

只需为循环添加

from selenium import webdriver
chrome_path = r"C:\Users\lenovo\Downloads\chromedriver_win32 (5)\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
for x in range(200):
    driver.get("https://www.kijijiautos.ca/vip/22442312")
    driver.find_element_by_xpath('//div[@class="b1yLWE b3zFtQ"]').text
    btn = driver.find_element_by_xpath('//button[@class="g1zAe-"]')
    btn.click()
    driver.find_elements_by_xpath('//span[@class="A2jAym q2jAym"]').text
    driver.find_element_by_xpath('//div[@class="b1yLWE b1zAe-"]').text
    print(driver.current_url)

相关问题 更多 >

    热门问题