我从MicrosoftAcademicKnowledgeAPI中提取数据,然后使用json响应作为字典来提取所需的信息。当我这样做的时候,我将信息添加到一个numpy数组中,最后我将它更改为一个pandas数据帧来导出。这个程序运行得很好,但运行起来需要大量的时间。它似乎在运行时放慢了速度,因为最初几次通过循环时,只需要几秒钟,但后来需要几分钟。你知道吗
我尽可能地简化if-else语句,这有点帮助,但不足以产生很大的影响。我还尽可能减少了对API的查询次数。每个查询只能返回1000个结果,但我需要大约35000个结果。你知道吗
rel_info = np.array([("Title", "Author_Name", "Jornal_Published_In", "Date")])
for l in range(0, loops): # loops is defined above to be 35
offset = 1000 * l
# keep track of progress
print("Progress:" + str(round((offset/total_res)*100, 2)) + "%")
# get data with request to MAK. 1000 is the max count
url = "https://api.labs.cognitive.microsoft.com/academic/v1.0/evaluate?expr=And(Composite(AA.AfN=='brigham young university'),Y>=1908)&model=latest&count=1000&offset="+str(offset)+"&attributes=Ti,D,AA.DAfN,AA.DAuN,J.JN"
response = req.get(url + '&subscription-key={key}')
data = response.json()
for i in range(0, len(data["entities"])):
new_data = data["entities"][i]
# get new data
new_title = new_data["Ti"] # get title
if 'J' not in new_data: # get journal account for if keys are not in dictionaries
new_journ = ""
else:
new_journ = new_data["J"]["JN"] or ""
new_date = new_data["D"] # get date
new_auth = "" # get authors only affiliated with BYU account for if keys are not in dictionary
for j in range(0, len(new_data["AA"])):
if 'DAfN' not in new_data["AA"][j]:
new_auth = new_auth + ""
else:
if new_data["AA"][j]["DAfN"] == "Brigham Young University" and new_auth == "": # posibly combine conditionals to make less complex
new_auth = new_data["AA"][j]["DAuN"]
elif new_data["AA"][j]["DAfN"] == "Brigham Young University" and new_auth != "":
new_auth = new_auth +", "+ new_data["AA"][j]["DAuN"]
# keep adding new data to whole dataframe
new_info = np.array([(new_title, new_auth, new_journ, new_date)])
rel_info = np.vstack((rel_info, new_info))
最后,我通过改变向收集的大量数据中添加数据的方式来解决这个问题。我没有在每次迭代中添加一行数据,而是构建了一个临时数组来容纳1000行数据,然后将这个临时数组添加到完整的数据数组中。这将运行时间缩短到大约一分钟,而不是之前的43分钟。你知道吗
尝试使用
concurrent.futures
在工作线程池中获取结果,如下所示:https://docs.python.org/3/library/concurrent.futures.html
相关问题 更多 >
编程相关推荐