谷歌学者阻止我使用搜索酒吧

2024-09-27 23:26:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用的是Pycharm社区版2020.3.2,学术版1.0.2,Tor版1.0.0。我试图从700篇文章中找出它们的引用次数。谷歌学者阻止我使用搜索酒吧(学术的一种功能)。然而,学术的另一个功能,即搜索作者,仍然运作良好。一开始,搜索酒吧功能运行正常。我试过这些密码

from scholarly import scholarly
scholarly.search_pubs('Large Batch Optimization for Deep Learning: Training BERT in 76 minutes')

经过几次试验,它显示了以下错误

Traceback (most recent call last):
  File "C:\Users\binhd\anaconda3\envs\t2\lib\site-packages\IPython\core\interactiveshell.py", line 3343, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-9-3bbcfb742cb5>", line 1, in <module>
    scholarly.search_pubs('Large Batch Optimization for Deep Learning: Training BERT in 76 minutes')
  File "C:\Users\binhd\anaconda3\envs\t2\lib\site-packages\scholarly\_scholarly.py", line 121, in search_pubs
    return self.__nav.search_publications(url)
  File "C:\Users\binhd\anaconda3\envs\t2\lib\site-packages\scholarly\_navigator.py", line 256, in search_publications
    return _SearchScholarIterator(self, url)
  File "C:\Users\binhd\anaconda3\envs\t2\lib\site-packages\scholarly\publication_parser.py", line 53, in __init__
    self._load_url(url)
  File "C:\Users\binhd\anaconda3\envs\t2\lib\site-packages\scholarly\publication_parser.py", line 58, in _load_url
    self._soup = self._nav._get_soup(url)
  File "C:\Users\binhd\anaconda3\envs\t2\lib\site-packages\scholarly\_navigator.py", line 200, in _get_soup
    html = self._get_page('https://scholar.google.com{0}'.format(url))
  File "C:\Users\binhd\anaconda3\envs\t2\lib\site-packages\scholarly\_navigator.py", line 152, in _get_page
    raise Exception("Cannot fetch the page from Google Scholar.")
Exception: Cannot fetch the page from Google Scholar.

然后,我发现原因是我需要从Google传递验证码,以便继续从Google Scholar获取信息。很多人建议我需要使用代理,因为我的IP被谷歌屏蔽了。我尝试使用FreeProxies()更改代理

from scholarly import scholarly, ProxyGenerator

pg = ProxyGenerator()
pg.FreeProxies()
scholarly.use_proxy(pg)
scholarly.search_pubs('Large Batch Optimization for Deep Learning: Training BERT in 76 minutes')

它不起作用,Pycharm被冻结了很长时间。然后,我安装了Tor(pip安装Tor)并重试:

from scholarly import scholarly, ProxyGenerator
pg = ProxyGenerator()
pg.Tor_External(tor_sock_port=9050, tor_control_port=9051, tor_password="scholarly_password")
scholarly.use_proxy(pg)
scholarly.search_pubs('Large Batch Optimization for Deep Learning: Training BERT in 76 minutes')

它不起作用。然后,我尝试使用SingleProxy()

from scholarly import scholarly, ProxyGenerator
pg = ProxyGenerator()
pg.SingleProxy(https='socks5://127.0.0.1:9050',http='socks5://127.0.0.1:9050')
scholarly.use_proxy(pg)
scholarly.search_pubs('Large Batch Optimization for Deep Learning: Training BERT in 76 minutes')

它也不起作用。我从未尝试过Luminati,因为我不熟悉它。如果有人知道解决方案,请帮助


Tags: inselfsearchlibpackageslinesiteusers

热门问题