如何设置selenium在当前url加载超过30秒时处理下一个url?

2024-09-30 08:28:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我想知道在例外情况下应该加什么。我当前使用的是pass语句,但我不确定它是否完全符合我的要求。我之所以要实现这一点,是因为有些网页完全加载需要30秒以上,比如:淘宝网. 我的代码如下:

from selenium import webdriver 
from time import sleep 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.common.exceptions import WebDriverException

profile = webdriver.FirefoxProfile() 
profile.add_extension(extension = '/Users/wayne/Desktop/fourthparty/extension/fourthparty.xpi') 

driver = webdriver.Firefox(profile)
def scan(cutoff): 
    with open('top-1m.csv', 'r') as f: 
        for num, url in enumerate(f): 
            if (num == 500): 
                return

            url = url.split(',')[1] 
            driver.get('http://www.' + url) 
            sleep(30) 

            try: 
                driver.set_page_load_timeout(30) 
            except TimeoutException: 
                pass

if __name__ == "__main__": 
    scan(500)

Tags: fromimporturlscanifdriverseleniumextension
2条回答

这是一个新程序,如果当前的url已经加载超过30秒,它将获得下一个url。因为我习惯了Java,所以它看起来可能不像典型的Python程序。 从selenium import webdriver 从时间导入睡眠 从selenium.common.异常导入TimeoutException 导入csv

profile = webdriver.FirefoxProfile()
profile.add_extension(extension = '/Users/wayne/Desktop/fourthparty-master/extension/fourthparty-jetpack.1.13.2.xpi')
driver = webdriver.Firefox(profile)

with open('top-1m.csv', 'r') as f:
    reader = csv.reader(f)
    fList = list(reader)

def crawl(cutoff):
    for i in range(0, cutoff):
        try:
            driver.set_page_load_timeout(30)
            getURL(i)
        except:
            pass

def getURL(num):
    url = 'http://www.' + fList[num][1]
    driver.get(url)
    sleep(30)


if __name__ == "__main__":
    crawl(10)

通常在实例化驱动程序后不久,执行一次set_page_load_timeout。然后,您应该在try中包装driver.get,如下所示:

from __future__ import print_function

from selenium import webdriver
from time import sleep
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException

profile = webdriver.FirefoxProfile()
profile.add_extension(extension = '/Users/wayne/Desktop/fourthparty/extension/fourthparty.xpi')
driver = webdriver.Firefox(profile)
driver.set_page_load_timeout(30)

def scan(cutoff):
    with open('top-1m.csv', 'r') as f:
        for num, url in enumerate(f):
            if (num == 500):
                return

            url = url.split(',')[1]
            try:
                driver.get('http://www.' + url)
            except TimeoutException:
                print("Caught and handled slow page timeout exception")

            # Do something here I guess?


if __name__ == "__main__":

scan(500)

相关问题 更多 >

    热门问题