当另一个应用程序正在使用internet时,Python-Selenium-web scraper会变慢

2024-10-01 02:33:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我构建了一个selenium web scraper(代码见下文)。它工作正常,每个循环通常需要4-6秒。但是,如果我使用不同的网络浏览器来做其他事情,比如检查我的电子邮件,网络刮板会变慢(有时每个循环最多需要几分钟),加载我的电子邮件也会花费很长的时间(或者我尝试在互联网上做的任何事情)。你知道吗

我的刮刀有毛病吗?或者是不可能运行一个网络刮板,同时也使用互联网做其他事情?或者。。。你知道吗

谢谢!你知道吗

    counter = 36386

options = Options()
options.set_headless(True)
driver = webdriver.Firefox(options=options, executable_path = r'C:\Users\jajacobs\Downloads\geckodriver.exe')

while counter <= 50000:
    start_time = time.time()
    try:
        driver.get("url goes here")
        timeout = 20
        inputElement = driver.find_element_by_name("naics_lookup[companyName]")
        inputElement.send_keys(naics.iloc[counter, 1])

        inputElement = driver.find_element_by_name("naics_lookup[city]")
        inputElement.send_keys(naics.iloc[counter, 3])

        inputElement = driver.find_element_by_name("naics_lookup[state]")
        inputElement.send_keys(naics.iloc[counter, 2])

        inputElement.submit() 
        print('Looking for NAICS code of company number ', counter)

        try:
            element_present = EC.presence_of_element_located((By.CLASS_NAME, 'results'))
            WebDriverWait(driver, timeout).until(element_present)
            print("element is ready")
            try:
                data = driver.find_element_by_class_name('results').text
                naics.at[counter, 'naics'] = re.findall(r"\D(\d{6})\D", data)[0]
                print(re.findall(r"\D(\d{6})\D", data)[0])
            except:
                print("No NAICS code")
                pass
        except:
            print("element did not load")
            pass

        list = [1000,2000,3000,4000,5000,6000,7000,8000,9000,10000,11000,12000,13000,
                14000,15000,16000,17000,18000,19000,20000,21000,22000,23000,24000,25000, 
                25000,26000,27000,28000,29000,30000,31000,32000,33000,34000,35000,36000,
                37000,38000,39000,40000,41000,42000,43000,44000,45000,46000,47000,48000,
                49000,50000,]

        if counter in list:
            data_folder = Path('C:/Users/jajacobs/Documents/ipynb/')
            file_to_save = data_folder / ('naics' + str(counter) + '.csv')
            naics.to_csv(file_to_save)

        counter += 1


    except Exception as e: 
        print(e)
        pass    
    print("total time taken this loop: ", time.time() - start_time)    
driver.close()

Tags: name网络databytimedrivercounterelement