我正在尝试刮取一些LinkedIn配置文件,但是,使用下面的代码,给了我一个错误:
错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-b6cfafdd5b52> in <module>
25 #sending our driver as the driver to be used by srape_linkedin
26 #you can also create driver options and pass it as an argument
---> 27 ps = ProfileScraper(cookie=myLI_AT_Key, scroll_increment=random.randint(10,50), scroll_pause=0.8 + random.uniform(0.8,1),driver=my_driver) #changed name, default driver and scroll_pause time and scroll_increment made a little random
28 print('Currently scraping: ', link, 'Time: ', datetime.now())
29 profile = ps.scrape(url=link) #changed name
~\Anaconda3\lib\site-packages\scrape_linkedin\Scraper.py in __init__(self, cookie, scraperInstance, driver, driver_options, scroll_pause, scroll_increment, timeout)
37
38 self.was_passed_instance = False
---> 39 self.driver = driver(**driver_options)
40 self.scroll_pause = scroll_pause
41 self.scroll_increment = scroll_increment
TypeError: 'WebDriver' object is not callable
代码:
from datetime import datetime
from scrape_linkedin import ProfileScraper
import random #new import made
from selenium import webdriver #new import made
import pandas as pd
import json
import os
import re
import time
os.chdir("C:\\Users\\MyUser\\Dropbox\\linkedInScrapper\\")
my_profile_list = ['https://www.linkedin.com/in/williamhgates/', 'https://www.linkedin.com/in/christinelagarde/', 'https://www.linkedin.com/in/ursula-von-der-leyen/']
myLI_AT_Key = MyKey # you need to obtain one from Linkedin using these steps:
# To get LI_AT key
# Navigate to www.linkedin.com and log in
# Open browser developer tools (Ctrl-Shift-I or right click -> inspect element)
# Select the appropriate tab for your browser (Application on Chrome, Storage on Firefox)
# Click the Cookies dropdown on the left-hand menu, and select the www.linkedin.com option
# Find and copy the li_at value
for link in my_profile_list:
#my_driver = webdriver.Chrome() #if you don't have Chromedrive in the environment path then use the next line instead of this
#my_driver = webdriver.Chrome()
my_driver = webdriver.Firefox(executable_path=r'C:\Users\MyUser\Dropbox\linkedInScrapper\geckodriver.exe')
#my_driver = webdriver.Chrome(executable_path=r'C:\Users\MyUser\Dropbox\linkedInScrapper\chromedriver.exe')
#sending our driver as the driver to be used by srape_linkedin
#you can also create driver options and pass it as an argument
ps = ProfileScraper(cookie=myLI_AT_Key, scroll_increment=random.randint(10,50), scroll_pause=0.8 + random.uniform(0.8,1),driver=my_driver) #changed name, default driver and scroll_pause time and scroll_increment made a little random
print('Currently scraping: ', link, 'Time: ', datetime.now())
profile = ps.scrape(url=link) #changed name
dataJSON = profile.to_dict()
profileName = re.sub('https://www.linkedin.com/in/', '', link)
profileName = profileName.replace("?originalSubdomain=es", "")
profileName = profileName.replace("?originalSubdomain=pe", "")
profileName = profileName.replace("?locale=en_US", "")
profileName = profileName.replace("?locale=es_ES", "")
profileName = profileName.replace("?originalSubdomain=uk", "")
profileName = profileName.replace("/", "")
with open(os.path.join(os.getcwd(), 'ScrapedLinkedInprofiles', profileName + '.json'), 'w') as json_file:
json.dump(dataJSON, json_file)
time.sleep(10 + random.randint(0,5)) #added randomness to the sleep time
#this will close your browser at the end of every iteration
my_driver.quit()
print('The first observation scraped was:', my_profile_list[0:])
print('The last observation scraped was:', my_profile_list[-1:])
print('END')
我尝试了许多不同的方法,试图让webdriver.Chrome()
工作,但没有任何运气。我尝试了Chrome(chromedriver)和Firefox(geckodriver),尝试了许多不同的方法来加载selenium
包,但我一直得到错误TypeError: 'WebDriver' object is not callable
下面是我的原始代码,仍然有效。(也就是说,它打开一个Google Chrome浏览器,并转到my_profiles_list
中的每个配置文件,但我想使用上面的代码
from datetime import datetime
from scrape_linkedin import ProfileScraper
import pandas as pd
import json
import os
import re
import time
my_profile_list = ['https://www.linkedin.com/in/williamhgates/', 'https://www.linkedin.com/in/christinelagarde/', 'https://www.linkedin.com/in/ursula-von-der-leyen/']
# To get LI_AT key
# Navigate to www.linkedin.com and log in
# Open browser developer tools (Ctrl-Shift-I or right click -> inspect element)
# Select the appropriate tab for your browser (Application on Chrome, Storage on Firefox)
# Click the Cookies dropdown on the left-hand menu, and select the www.linkedin.com option
# Find and copy the li_at value
myLI_AT_Key = 'INSERT LI_AT Key'
with ProfileScraper(cookie=myLI_AT_Key, scroll_increment = 50, scroll_pause = 0.8) as scraper:
for link in my_profile_list:
print('Currently scraping: ', link, 'Time: ', datetime.now())
profile = scraper.scrape(url=link)
dataJSON = profile.to_dict()
profileName = re.sub('https://www.linkedin.com/in/', '', link)
profileName = profileName.replace("?originalSubdomain=es", "")
profileName = profileName.replace("?originalSubdomain=pe", "")
profileName = profileName.replace("?locale=en_US", "")
profileName = profileName.replace("?locale=es_ES", "")
profileName = profileName.replace("?originalSubdomain=uk", "")
profileName = profileName.replace("/", "")
with open(os.path.join(os.getcwd(), 'ScrapedLinkedInprofiles', profileName + '.json'), 'w') as json_file:
json.dump(dataJSON, json_file)
time.sleep(10)
print('The first observation scraped was:', my_profile_list[0:])
print('The last observation scraped was:', my_profile_list[-1:])
print('END')
注:
代码略有不同,因为我问了一个关于SOhere的问题,@Ananth帮助我找到了解决方案
我也知道,在网上有一些与selenium
和chromedriver
相关的“类似”问题,因此,在尝试了每一个建议的解决方案后,我仍然无法使其发挥作用。(即,常见的解决方案是webdriver.Chrome()
中的打字错误)
目前没有回答
相关问题 更多 >
编程相关推荐