我试图得到前5名的具体书籍缩写的网址。我将参数num设置为5,我假设它将返回前5个结果,stop=1,这意味着在返回5个结果之后,将不再发送HTTP请求。出于某种原因,当我设置num=5和stop=1时,我只得到3个结果,并且对于正在搜索的标题,我得到相同的3个结果(这显然应该是不同的)。另外,在测试解决这个问题时,我遇到了HTTP错误503,尽管我休眠了循环,但这个站点上的其他人认为这可以防止这个错误。我的代码如下。。。你知道吗
import random
import time
count = 0
my_file = open('sometextfile.txt','r')
for aline in my_file:
print("******************************")
print(aline)
count += 1
record_list = aline.split("\t")
if "." in record_list[1]:
search_results = google.search(record_list[2],num=5,stop=1,pause=3.)
for result in search_results:
print(result)
time.sleep(random.randrange(0,3))
并具有以下输出。。。你知道吗
4 Environmental and Behaviour ['0143-005X']
******************************
4 Sustainable Cities and Society ['0143-005X']
******************************
4 Chicago to LA: Making sense of urban theory ['0272-4944']
******************************
4 As adopted by the International Health Conference ['0272-4944']
******************************
5 J. Wetl. ['1442-9985']
https://www.ncbi.nlm.nih.gov/nlmcatalog?term=1442-9985%5BISSN%5D
http://www.wiley.com/bw/journal.asp?ref=1442-9985
http://www.wiley.com/WileyCDA/WileyTitle/productCd-AEC.html
******************************
5 Curr. Opin. Environ. Sustain. ['1442-9985']
https://www.ncbi.nlm.nih.gov/nlmcatalog?term=1442-9985%5BISSN%5D
http://www.wiley.com/bw/journal.asp?ref=1442-9985
http://www.wiley.com/WileyCDA/WileyTitle/productCd-AEC.html
******************************
5 For. Policy Econ. ['1442-9985']
https://www.ncbi.nlm.nih.gov/nlmcatalog?term=1442-9985%5BISSN%5D
http://www.wiley.com/bw/journal.asp?ref=1442-9985
http://www.wiley.com/WileyCDA/WileyTitle/productCd-AEC.html
******************************
5 For. Policy Econ. ['1442-9985']
https://www.ncbi.nlm.nih.gov/nlmcatalog?term=1442-9985%5BISSN%5D
http://www.wiley.com/bw/journal.asp?ref=1442-9985
http://www.wiley.com/WileyCDA/WileyTitle/productCd-AEC.html
******************************
5 Asia. World Dev. ['1442-9985']
Traceback (most recent call last):
File "C:/Users/Peter/Desktop/Programming/Ibata Arens Project/google_search.py", line 27, in <module>
for result in search_results:
File "C:\Users\Peter\Anaconda3\lib\site-packages\google\__init__.py", line 304, in search
html = get_page(url)
File "C:\Users\Peter\Anaconda3\lib\site-packages\google\__init__.py", line 121, in get_page
response = urlopen(request)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 472, in open
response = meth(req, response)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 504, in error
result = self._call_chain(*args)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 444, in _call_chain
result = func(*args)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 696, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 472, in open
response = meth(req, response)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 510, in error
return self._call_chain(*args)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 444, in _call_chain
result = func(*args)
File "C:\Users\Peter\Anaconda3\lib\urllib\request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Service Unavailable
我还想知道,如果只是简单地使用urllib并浏览返回的html是否更好,因为我的目标只是检索每个缩写书名的issn。你知道吗
目前没有回答
相关问题 更多 >
编程相关推荐