pythonselenium使用循环语句从网站列表fi获取每个网站的元素属性

3条回答

网友

1楼 · 编辑于 2024-10-03 19:22:31

要读取excel，请使用xlrd库。在sheet.cell_value(i, 0)中，i是row索引，0是列索引。根据excel数据更改列索引。在

定义报废函数和返回值，必要时追加到另一个列表中。在您的例子中，您只是在打印，所以我返回None

import xlrd
from selenium import webdriver
# Give the location of the file


def scrapping(browser, links):

    browser.get(links)
    website_link_anchor = browser.find_element_by_xpath("//dd[@class='website']/a")
    actual_website_link = website_link_anchor.get_attribute("href")
    print(actual_website_link)
    return None


driver = webdriver.Chrome()

loc = ("path of file")

# To open Workbook
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
# links = []


for i in range(1, sheet.nrows):
    scrapping(driver, sheet.cell_value(i, 0))
    # links.append(sheet.cell_value(i, 0))

driver.close()

网友

2楼 · 编辑于 2024-10-03 19:22:31

要循环浏览网站列表（从Excel文件）并从每个网站获取值，您需要：

为你想浏览的网站创建一个列表。在
然后调用每个网站并查找所需的元素。在
打印实际的网站链接并再次循环。在
始终在tearDown(){}方法中调用driver.quit()，以优雅地关闭并销毁WebDriver和Web Client实例。在

您的示例代码将是：

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

myLinks = ['https://www.inc.com/profile/dom-&-tom', 'https://www.inc.com/profile/decksouth', 'https://www.inc.com/profile/shp-financial']

options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument(" disable-extensions")
browser = webdriver.Chrome(chrome_options=options, executable_path=r'C:\path\to\chromedriver.exe')  
for link in myLinks:
    browser.get(link)
    website_link_anchor = browser.find_element_by_xpath("//dd[@class='website']/a")
    actual_website_link = website_link_anchor.get_attribute("href")
    print(actual_website_link)
browser.quit()

网友

3楼 · 编辑于 2024-10-03 19:22:31

有什么改进我的代码的建议吗？在

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.firefox.options import Options
import xlrd
import xlwt
from xlutils.copy import copy

def scraping(browser, link):
    returnValue = ""
    browser.get(link)
    try:
        website_link_anchor = browser.find_element_by_xpath("//dd[@class='website']/a")
        actual_website_link = website_link_anchor.get_attribute("href")
        returnValue = actual_website_link
    except NoSuchElementException: 
        returnValue = "Element not found for: " + link
    return returnValue

options = Options()
options.add_argument(" headless")
browser = webdriver.Firefox(firefox_options=options, executable_path=r'C:\WebDrivers\geckodriver.exe')

file_to_read = ("C:\INC5000\list.xlsx")

# read
file_to_read_wb = xlrd.open_workbook(file_to_read)
file_to_read_wb_sheet = file_to_read_wb.sheet_by_index(0)

# copy and write
file_to_write_to_wb = copy(file_to_read_wb)
file_to_write_to_wb_sheet = file_to_write_to_wb.get_sheet(0)

for i in range(1, file_to_read_wb_sheet.nrows):
    result = scraping(browser, file_to_read_wb_sheet.cell_value(i, 0))
    file_to_write_to_wb_sheet.write(i, 1, result)

file_to_write_to_wb.save("C:\INC5000\list2.xls")

browser.close()

相关问题更多 >

编程相关推荐

热门问题

热门文章

pythonselenium使用循环语句从网站列表fi获取每个网站的元素属性

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >