使用多线程模块将API数据检索到dataframe中问题的回答

使用多线程模块将API数据检索到dataframe中

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我正在使用一个第三方API从不同标签的大量天数中检索10分钟的数据。根据天数和标签数量，当前数据提取可能需要几分钟。因此，我正在尝试多线程技术，我知道这对于繁重的IO操作非常有用 API调用如下（我已经替换了实际的API名称）： <pre><code>import numpy as N import requests as r import json import pandas as pd from datetime import datetime import concurrent.futures class pyGeneric: def __init__(self, serverName, apiKey, rootApiUrl='/Generic.Services/api'): """ Initialize a connection to server, and return a pyGeneric server object """ self.baseUrl = serverName + rootApiUrl self.apiKey = apiKey self.bearer = 'Bearer ' + apiKey self.header = {'mediaType':'application/json','Authorization':self.bearer} def getRawMeasurementsJson(self, tag, start, end): apiQuery = '/measurements/' + tag + '/from/' + start + '/to/' + end + '?format=json' dataresponse = r.get(self.baseUrl+apiQuery, headers=self.header) data = json.loads(dataresponse.text) return data def getAggregatesPandas(self, tags, start, end): """ Return tag(s) in a pandas dataFrame """ df = pd.DataFrame() if type(tags) == str: tags = [tags] for tag in tags: tempJson = self.getRawMeasurementsJson(tag, start, end) tempDf = pd.DataFrame(tempJson['timeSeriesList'][0]['timeSeries']) name = tempJson['timeSeriesList'][0]['measurementName'] df['TimeUtc'] = [datetime.fromtimestamp(i/1000) for i in tempDf['t']] df['TimeUtc'] = df['TimeUtc'].dt.round('min') df[name] = tempDf['v'] return df gener = pyGeneric('https://api.generic.com', 'auth_keymlkj9789878686') </code></pre> 对API的调用示例如下： <code>gener_df = gener.getAggregatesPandas('tag1.10m.SQL', '*-10d', '*')</code> 这对于单个标记来说是可行的，但对于列表来说，这需要更长的时间，这就是为什么我一直在尝试以下方法： <pre><code>tags = ['tag1.10m.SQL', 'tag2.10m.SQL', 'tag3.10m.SQL', 'tag4.10m.SQL', 'tag5.10m.SQL', 'tag6.10m.SQL', 'tag7.10m.SQL', 'tag8.10m.SQL', 'tag9.10m.SQL', 'tag10.10m.SQL'] startdate = "*-150d" enddate = '*' final_df = pd.DataFrame with concurrent.futures.ThreadPoolExecutor() as executor: args = ((i,startdate, enddate) for i in tags) executor.map(lambda p: gener.getAggregatesPandas(*p), args) </code></pre> 但是，我无法检查genr.getAggregatesPandas是否正确执行。最终，我希望在名为final_df的数据帧中获得结果，但也不确定如何继续。我在这篇{a1}中读到，在上下文管理器中追加将导致数据帧的二次副本，因此最终会减慢速度

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

使用多线程模块将API数据检索到dataframe中

1 个回答

相关Python问题