回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我正在使用一个第三方API从不同标签的大量天数中检索10分钟的数据。根据天数和标签数量,当前数据提取可能需要几分钟。因此,我正在尝试多线程技术,我知道这对于繁重的IO操作非常有用</p>
<p>API调用如下(我已经替换了实际的API名称):</p>
<pre><code>import numpy as N
import requests as r
import json
import pandas as pd
from datetime import datetime
import concurrent.futures
class pyGeneric:
def __init__(self, serverName, apiKey, rootApiUrl='/Generic.Services/api'):
"""
Initialize a connection to server, and return a pyGeneric server object
"""
self.baseUrl = serverName + rootApiUrl
self.apiKey = apiKey
self.bearer = 'Bearer ' + apiKey
self.header = {'mediaType':'application/json','Authorization':self.bearer}
def getRawMeasurementsJson(self, tag, start, end):
apiQuery = '/measurements/' + tag + '/from/' + start + '/to/' + end + '?format=json'
dataresponse = r.get(self.baseUrl+apiQuery, headers=self.header)
data = json.loads(dataresponse.text)
return data
def getAggregatesPandas(self, tags, start, end):
"""
Return tag(s) in a pandas dataFrame
"""
df = pd.DataFrame()
if type(tags) == str:
tags = [tags]
for tag in tags:
tempJson = self.getRawMeasurementsJson(tag, start, end)
tempDf = pd.DataFrame(tempJson['timeSeriesList'][0]['timeSeries'])
name = tempJson['timeSeriesList'][0]['measurementName']
df['TimeUtc'] = [datetime.fromtimestamp(i/1000) for i in tempDf['t']]
df['TimeUtc'] = df['TimeUtc'].dt.round('min')
df[name] = tempDf['v']
return df
gener = pyGeneric('https://api.generic.com', 'auth_keymlkj9789878686')
</code></pre>
<p>对API的调用示例如下:
<code>gener_df = gener.getAggregatesPandas('tag1.10m.SQL', '*-10d', '*')</code></p>
<p>这对于单个标记来说是可行的,但对于列表来说,这需要更长的时间,这就是为什么我一直在尝试以下方法:</p>
<pre><code>tags = ['tag1.10m.SQL',
'tag2.10m.SQL',
'tag3.10m.SQL',
'tag4.10m.SQL',
'tag5.10m.SQL',
'tag6.10m.SQL',
'tag7.10m.SQL',
'tag8.10m.SQL',
'tag9.10m.SQL',
'tag10.10m.SQL']
startdate = "*-150d"
enddate = '*'
final_df = pd.DataFrame
with concurrent.futures.ThreadPoolExecutor() as executor:
args = ((i,startdate, enddate) for i in tags)
executor.map(lambda p: gener.getAggregatesPandas(*p), args)
</code></pre>
<p>但是,我无法检查genr.getAggregatesPandas是否正确执行。最终,我希望在名为final_df的数据帧中获得结果,但也不确定如何继续。我在这篇{a1}中读到,在上下文管理器中追加将导致数据帧的二次副本,因此最终会减慢速度</p>