了解中的参数谷歌搜索()

2024-09-27 19:18:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图得到前5名的具体书籍缩写的网址。我将参数num设置为5,我假设它将返回前5个结果,stop=1,这意味着在返回5个结果之后,将不再发送HTTP请求。出于某种原因,当我设置num=5和stop=1时,我只得到3个结果,并且对于正在搜索的标题,我得到相同的3个结果(这显然应该是不同的)。另外,在测试解决这个问题时,我遇到了HTTP错误503,尽管我休眠了循环,但这个站点上的其他人认为这可以防止这个错误。我的代码如下。。。你知道吗

    import random
    import time

    count = 0

    my_file = open('sometextfile.txt','r')

    for aline in my_file:
        print("******************************")
        print(aline)
        count += 1
        record_list = aline.split("\t")

        if "." in record_list[1]:
            search_results = google.search(record_list[2],num=5,stop=1,pause=3.)
            for result in search_results:
                print(result)
        time.sleep(random.randrange(0,3))

并具有以下输出。。。你知道吗

    4   Environmental and Behaviour ['0143-005X']

******************************
4   Sustainable Cities and Society  ['0143-005X']

******************************
4   Chicago to LA: Making sense of urban theory ['0272-4944']

******************************
4   As adopted by the International Health Conference   ['0272-4944']

******************************
5   J. Wetl.    ['1442-9985']

https://www.ncbi.nlm.nih.gov/nlmcatalog?term=1442-9985%5BISSN%5D
http://www.wiley.com/bw/journal.asp?ref=1442-9985
http://www.wiley.com/WileyCDA/WileyTitle/productCd-AEC.html
******************************
5   Curr. Opin. Environ. Sustain.   ['1442-9985']

https://www.ncbi.nlm.nih.gov/nlmcatalog?term=1442-9985%5BISSN%5D
http://www.wiley.com/bw/journal.asp?ref=1442-9985
http://www.wiley.com/WileyCDA/WileyTitle/productCd-AEC.html
******************************
5   For. Policy Econ.   ['1442-9985']

https://www.ncbi.nlm.nih.gov/nlmcatalog?term=1442-9985%5BISSN%5D
http://www.wiley.com/bw/journal.asp?ref=1442-9985
http://www.wiley.com/WileyCDA/WileyTitle/productCd-AEC.html
******************************
5   For. Policy Econ.   ['1442-9985']

https://www.ncbi.nlm.nih.gov/nlmcatalog?term=1442-9985%5BISSN%5D
http://www.wiley.com/bw/journal.asp?ref=1442-9985
http://www.wiley.com/WileyCDA/WileyTitle/productCd-AEC.html
******************************
5   Asia. World Dev.    ['1442-9985']

Traceback (most recent call last):
  File "C:/Users/Peter/Desktop/Programming/Ibata Arens Project/google_search.py", line 27, in <module>
    for result in search_results:
  File "C:\Users\Peter\Anaconda3\lib\site-packages\google\__init__.py", line 304, in search
    html = get_page(url)
  File "C:\Users\Peter\Anaconda3\lib\site-packages\google\__init__.py", line 121, in get_page
    response = urlopen(request)
  File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 163, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 472, in open
    response = meth(req, response)
  File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 582, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 504, in error
    result = self._call_chain(*args)
  File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 444, in _call_chain
    result = func(*args)
  File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 696, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 472, in open
    response = meth(req, response)
  File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 582, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 510, in error
    return self._call_chain(*args)
  File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 444, in _call_chain
    result = func(*args)
  File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 590, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Service Unavailable

我还想知道,如果只是简单地使用urllib并浏览返回的html是否更好,因为我的目标只是检索每个缩写书名的issn。你知道吗


Tags: inpycomhttpresponserequestlibwww

热门问题