我构建了一个selenium web scraper(代码见下文)。它工作正常,每个循环通常需要4-6秒。但是,如果我使用不同的网络浏览器来做其他事情,比如检查我的电子邮件,网络刮板会变慢(有时每个循环最多需要几分钟),加载我的电子邮件也会花费很长的时间(或者我尝试在互联网上做的任何事情)。你知道吗
我的刮刀有毛病吗?或者是不可能运行一个网络刮板,同时也使用互联网做其他事情?或者。。。你知道吗
谢谢!你知道吗
counter = 36386
options = Options()
options.set_headless(True)
driver = webdriver.Firefox(options=options, executable_path = r'C:\Users\jajacobs\Downloads\geckodriver.exe')
while counter <= 50000:
start_time = time.time()
try:
driver.get("url goes here")
timeout = 20
inputElement = driver.find_element_by_name("naics_lookup[companyName]")
inputElement.send_keys(naics.iloc[counter, 1])
inputElement = driver.find_element_by_name("naics_lookup[city]")
inputElement.send_keys(naics.iloc[counter, 3])
inputElement = driver.find_element_by_name("naics_lookup[state]")
inputElement.send_keys(naics.iloc[counter, 2])
inputElement.submit()
print('Looking for NAICS code of company number ', counter)
try:
element_present = EC.presence_of_element_located((By.CLASS_NAME, 'results'))
WebDriverWait(driver, timeout).until(element_present)
print("element is ready")
try:
data = driver.find_element_by_class_name('results').text
naics.at[counter, 'naics'] = re.findall(r"\D(\d{6})\D", data)[0]
print(re.findall(r"\D(\d{6})\D", data)[0])
except:
print("No NAICS code")
pass
except:
print("element did not load")
pass
list = [1000,2000,3000,4000,5000,6000,7000,8000,9000,10000,11000,12000,13000,
14000,15000,16000,17000,18000,19000,20000,21000,22000,23000,24000,25000,
25000,26000,27000,28000,29000,30000,31000,32000,33000,34000,35000,36000,
37000,38000,39000,40000,41000,42000,43000,44000,45000,46000,47000,48000,
49000,50000,]
if counter in list:
data_folder = Path('C:/Users/jajacobs/Documents/ipynb/')
file_to_save = data_folder / ('naics' + str(counter) + '.csv')
naics.to_csv(file_to_save)
counter += 1
except Exception as e:
print(e)
pass
print("total time taken this loop: ", time.time() - start_time)
driver.close()
目前没有回答
相关问题 更多 >
编程相关推荐