对于df中的每个唯一时间戳,生成dataframe(Python)中所有行的向量

2024-05-20 03:14:28 发布

您现在位置:Python中文网/ 问答频道 /正文

如何为每个时间戳创建包含多个传感器值的向量。此数据子集中所需的输出是将前三行分组,因为它们具有完全相同的时间戳,如: ([21,0,05236],[6,6,0,58],[18,1,0,1770]),然后是下一个时间戳等

而且,这必须在没有for循环的情况下完成,因为它几乎有一百万行


Tags: 数据for时间情况传感器向量
3条回答

IIUC,您可以groupReport_Time上的数据帧,然后对于与唯一Timestamp相对应的每个组,您可以创建从Timestamp到从ABTypeMeter_Valuedict中获得的所需数组的映射:

cols = ['A', 'B', 'Type', 'Meter_Value']
info = {k: g[cols].to_numpy() for k, g in df.groupby('Report_Time')}

要访问与唯一Timestamp对应的数组,可以使用字典查找:

>>> info[pd.Timestamp('2021-02-04 11:03:34')]

array([[21, 0, 0, '5236'],
       [6, 6, 0, '58'],
       [18, 1, 0, '1770'],
       [21, 0, 0, '5237']], dtype=object)

>>> info[pd.Timestamp('2021-02-04 11:03:35')]

array([[6, 6, 0, '57'],
       [19, 2, 0, '1732'],
       [21, 0, 0, '5238'],
       [18, 1, 0, '1769']], dtype=object)

检查时间戳是否更改,如果更改,则开始收集变量中的行,然后使用as.vector(t(Dataframevariable))对它们进行向量化。 thispost可能会对您有所帮助

 dct = {'Report_Time': {5813: pd.Timestamp('2021-02-04 11:03:34'), 5823: 
                   pd.Timestamp('2021-02-04 11:03:34'), 5824: 
                   pd.Timestamp('2021-02-04 11:03:34'), 5825: 
                   pd.Timestamp('2021-02-04 11:03:34'), 5829: 
                   pd.Timestamp('2021-02-04 11:03:35'), 5830: 
                   pd.Timestamp('2021-02-04 11:03:35'), 5831: 
                   pd.Timestamp('2021-02-04 11:03:35'), 5839: 
                   pd.Timestamp('2021-02-04 11:03:35')}, 
   'Subsystem': {5813: 0, 5823: 0, 5824: 0, 5825: 0, 5829: 0, 5830: 0, 5831: 0, 5839: 0}, 'A': {5813: 21, 5823: 6, 5824: 18, 5825: 21, 5829: 6, 5830: 19, 5831: 21, 5839: 18}, 'B': {5813: 0, 5823: 6, 5824: 1, 5825: 0, 5829: 6, 5830: 2, 5831: 0, 5839: 1}, 'Type': {5813: 0, 5823: 0, 5824: 0, 5825: 0, 5829: 0, 5830: 0, 5831: 0, 5839: 0}, 'Meter_Value': {5813: '5236', 5823: '58', 5824: '1770', 5825: '5237', 5829: '57', 5830: '1732', 5831: '5238', 5839: '1769'}}

df = pd.DataFrame(dct)
print(df.columns)


grouped=df.groupby('Report_Time').agg(lambda x: x.tolist())

results=[ (x.index,key, list(x))  for key,x in grouped.iteritems()]
print(results)

输出:

[(DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'Subsystem', [[0, 0, 0, 0], [0, 0, 0, 0]]), (DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'A', [[21, 6, 18, 21], [6, 19, 21, 18]]), (DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'B', [[0, 6, 1, 0], [6, 2, 0, 1]]), (DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'Type', [[0, 0, 0, 0], [0, 0, 0, 0]]), (DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'Meter_Value', [['5236', '58', '1770', '5237'], ['57', '1732', '5238', '1769']])]

在[]中:

 tuples=[]
 for my_tuples in results:
     (dates,key,data)=my_tuples
     for i in range(len(dates)):
         print(dates[i],key,data[i])
         tuples.append((dates[i],key,data[i]))
for a_tuple in tuples:
    print(a_tuple)

输出:

    Index(['Report_Time', 'Subsystem', 'A', 'B', 'Type', 'Meter_Value'], dtype='object')
   (Timestamp('2021-02-04 11:03:34'), 'Subsystem', [0, 0, 0, 0])
   (Timestamp('2021-02-04 11:03:35'), 'Subsystem', [0, 0, 0, 0])
   (Timestamp('2021-02-04 11:03:34'), 'A', [21, 6, 18, 21])
   (Timestamp('2021-02-04 11:03:35'), 'A', [6, 19, 21, 18])
   (Timestamp('2021-02-04 11:03:34'), 'B', [0, 6, 1, 0])
   (Timestamp('2021-02-04 11:03:35'), 'B', [6, 2, 0, 1])
   (Timestamp('2021-02-04 11:03:34'), 'Type', [0, 0, 0, 0])
   (Timestamp('2021-02-04 11:03:35'), 'Type', [0, 0, 0, 0])
   (Timestamp('2021-02-04 11:03:34'), 'Meter_Value', ['5236', '58', '1770', '5237'])
   (Timestamp('2021-02-04 11:03:35'), 'Meter_Value', ['57', '1732', '5238', '1769'])

相关问题 更多 >