Selenium WebDriver在导入Selenium时不可调用错误,但在不导入Selenium时有效

2024-09-27 07:34:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试刮取一些LinkedIn配置文件,但是,使用下面的代码,给了我一个错误:

错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-b6cfafdd5b52> in <module>
     25     #sending our driver as the driver to be used by srape_linkedin
     26     #you can also create driver options and pass it as an argument
---> 27     ps = ProfileScraper(cookie=myLI_AT_Key, scroll_increment=random.randint(10,50), scroll_pause=0.8 + random.uniform(0.8,1),driver=my_driver)  #changed name, default driver and scroll_pause time and scroll_increment made a little random
     28     print('Currently scraping: ', link, 'Time: ', datetime.now())
     29     profile = ps.scrape(url=link)       #changed name

~\Anaconda3\lib\site-packages\scrape_linkedin\Scraper.py in __init__(self, cookie, scraperInstance, driver, driver_options, scroll_pause, scroll_increment, timeout)
     37 
     38         self.was_passed_instance = False
---> 39         self.driver = driver(**driver_options)
     40         self.scroll_pause = scroll_pause
     41         self.scroll_increment = scroll_increment

TypeError: 'WebDriver' object is not callable

代码:

    from datetime import datetime
        from scrape_linkedin import ProfileScraper
        import random                       #new import made
        from selenium import webdriver  #new import made
        import pandas as pd
        import json
        import os
        import re
        import time
        
        os.chdir("C:\\Users\\MyUser\\Dropbox\\linkedInScrapper\\")
        
my_profile_list = ['https://www.linkedin.com/in/williamhgates/', 'https://www.linkedin.com/in/christinelagarde/', 'https://www.linkedin.com/in/ursula-von-der-leyen/']
        
        myLI_AT_Key = MyKey # you need to obtain one from Linkedin using these steps:
    
    # To get LI_AT key
    # Navigate to www.linkedin.com and log in
    # Open browser developer tools (Ctrl-Shift-I or right click -> inspect element)
    # Select the appropriate tab for your browser (Application on Chrome, Storage on Firefox)
    # Click the Cookies dropdown on the left-hand menu, and select the www.linkedin.com option
    # Find and copy the li_at value
        
        for link in my_profile_list:
        
            #my_driver = webdriver.Chrome()  #if you don't have Chromedrive in the environment path then use the next line instead of this
            #my_driver = webdriver.Chrome()
            my_driver = webdriver.Firefox(executable_path=r'C:\Users\MyUser\Dropbox\linkedInScrapper\geckodriver.exe')
            #my_driver = webdriver.Chrome(executable_path=r'C:\Users\MyUser\Dropbox\linkedInScrapper\chromedriver.exe')
            #sending our driver as the driver to be used by srape_linkedin
            #you can also create driver options and pass it as an argument
            ps = ProfileScraper(cookie=myLI_AT_Key, scroll_increment=random.randint(10,50), scroll_pause=0.8 + random.uniform(0.8,1),driver=my_driver)  #changed name, default driver and scroll_pause time and scroll_increment made a little random
            print('Currently scraping: ', link, 'Time: ', datetime.now())
            profile = ps.scrape(url=link)       #changed name
            dataJSON = profile.to_dict()
        
            profileName = re.sub('https://www.linkedin.com/in/', '', link)
            profileName = profileName.replace("?originalSubdomain=es", "")
            profileName = profileName.replace("?originalSubdomain=pe", "")
            profileName = profileName.replace("?locale=en_US", "")
            profileName = profileName.replace("?locale=es_ES", "")
            profileName = profileName.replace("?originalSubdomain=uk", "")
            profileName = profileName.replace("/", "")
        
            with open(os.path.join(os.getcwd(), 'ScrapedLinkedInprofiles', profileName + '.json'), 'w') as json_file:
                json.dump(dataJSON, json_file)
                time.sleep(10 + random.randint(0,5))    #added randomness to the sleep time
            #this will close your browser at the end of every iteration
            my_driver.quit()
        
        
        
        print('The first observation scraped was:', my_profile_list[0:])
        print('The last observation scraped was:', my_profile_list[-1:])
        print('END')

我尝试了许多不同的方法,试图让webdriver.Chrome()工作,但没有任何运气。我尝试了Chrome(chromedriver)和Firefox(geckodriver),尝试了许多不同的方法来加载selenium包,但我一直得到错误TypeError: 'WebDriver' object is not callable

下面是我的原始代码,仍然有效。(也就是说,它打开一个Google Chrome浏览器,并转到my_profiles_list中的每个配置文件,但我想使用上面的代码

from datetime import datetime
from scrape_linkedin import ProfileScraper
import pandas as pd
import json
import os
import re
import time

my_profile_list = ['https://www.linkedin.com/in/williamhgates/', 'https://www.linkedin.com/in/christinelagarde/', 'https://www.linkedin.com/in/ursula-von-der-leyen/']
# To get LI_AT key
# Navigate to www.linkedin.com and log in
# Open browser developer tools (Ctrl-Shift-I or right click -> inspect element)
# Select the appropriate tab for your browser (Application on Chrome, Storage on Firefox)
# Click the Cookies dropdown on the left-hand menu, and select the www.linkedin.com option
# Find and copy the li_at value
myLI_AT_Key = 'INSERT LI_AT Key'
with ProfileScraper(cookie=myLI_AT_Key, scroll_increment = 50, scroll_pause = 0.8) as scraper:
    for link in my_profile_list:
        print('Currently scraping: ', link, 'Time: ', datetime.now())
        profile = scraper.scrape(url=link)
        dataJSON = profile.to_dict()
        
        profileName = re.sub('https://www.linkedin.com/in/', '', link)
        profileName = profileName.replace("?originalSubdomain=es", "")
        profileName = profileName.replace("?originalSubdomain=pe", "")
        profileName = profileName.replace("?locale=en_US", "")
        profileName = profileName.replace("?locale=es_ES", "")
        profileName = profileName.replace("?originalSubdomain=uk", "")
        profileName = profileName.replace("/", "")
        
        with open(os.path.join(os.getcwd(), 'ScrapedLinkedInprofiles', profileName + '.json'), 'w') as json_file:
            json.dump(dataJSON, json_file)
            time.sleep(10)
            
print('The first observation scraped was:', my_profile_list[0:])
print('The last observation scraped was:', my_profile_list[-1:])
print('END')

注:

代码略有不同,因为我问了一个关于SOhere的问题,@Ananth帮助我找到了解决方案

我也知道,在网上有一些与seleniumchromedriver相关的“类似”问题,因此,在尝试了每一个建议的解决方案后,我仍然无法使其发挥作用。(即,常见的解决方案是webdriver.Chrome()中的打字错误)


Tags: andtheinimportcomjsonmywww

热门问题