2024-05-20 03:14:28 发布
网友
如何为每个时间戳创建包含多个传感器值的向量。此数据子集中所需的输出是将前三行分组,因为它们具有完全相同的时间戳,如: ([21,0,05236],[6,6,0,58],[18,1,0,1770]),然后是下一个时间戳等
而且,这必须在没有for循环的情况下完成,因为它几乎有一百万行
IIUC,您可以group在Report_Time上的数据帧,然后对于与唯一Timestamp相对应的每个组,您可以创建从Timestamp到从A、B、Type和Meter_Value列dict中获得的所需数组的映射:
group
Report_Time
Timestamp
A
B
Type
Meter_Value
dict
cols = ['A', 'B', 'Type', 'Meter_Value'] info = {k: g[cols].to_numpy() for k, g in df.groupby('Report_Time')}
要访问与唯一Timestamp对应的数组,可以使用字典查找:
>>> info[pd.Timestamp('2021-02-04 11:03:34')] array([[21, 0, 0, '5236'], [6, 6, 0, '58'], [18, 1, 0, '1770'], [21, 0, 0, '5237']], dtype=object) >>> info[pd.Timestamp('2021-02-04 11:03:35')] array([[6, 6, 0, '57'], [19, 2, 0, '1732'], [21, 0, 0, '5238'], [18, 1, 0, '1769']], dtype=object)
检查时间戳是否更改,如果更改,则开始收集变量中的行,然后使用as.vector(t(Dataframevariable))对它们进行向量化。 thispost可能会对您有所帮助
as.vector(t(Dataframevariable))
dct = {'Report_Time': {5813: pd.Timestamp('2021-02-04 11:03:34'), 5823: pd.Timestamp('2021-02-04 11:03:34'), 5824: pd.Timestamp('2021-02-04 11:03:34'), 5825: pd.Timestamp('2021-02-04 11:03:34'), 5829: pd.Timestamp('2021-02-04 11:03:35'), 5830: pd.Timestamp('2021-02-04 11:03:35'), 5831: pd.Timestamp('2021-02-04 11:03:35'), 5839: pd.Timestamp('2021-02-04 11:03:35')}, 'Subsystem': {5813: 0, 5823: 0, 5824: 0, 5825: 0, 5829: 0, 5830: 0, 5831: 0, 5839: 0}, 'A': {5813: 21, 5823: 6, 5824: 18, 5825: 21, 5829: 6, 5830: 19, 5831: 21, 5839: 18}, 'B': {5813: 0, 5823: 6, 5824: 1, 5825: 0, 5829: 6, 5830: 2, 5831: 0, 5839: 1}, 'Type': {5813: 0, 5823: 0, 5824: 0, 5825: 0, 5829: 0, 5830: 0, 5831: 0, 5839: 0}, 'Meter_Value': {5813: '5236', 5823: '58', 5824: '1770', 5825: '5237', 5829: '57', 5830: '1732', 5831: '5238', 5839: '1769'}} df = pd.DataFrame(dct) print(df.columns) grouped=df.groupby('Report_Time').agg(lambda x: x.tolist()) results=[ (x.index,key, list(x)) for key,x in grouped.iteritems()] print(results)
输出:
[(DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'Subsystem', [[0, 0, 0, 0], [0, 0, 0, 0]]), (DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'A', [[21, 6, 18, 21], [6, 19, 21, 18]]), (DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'B', [[0, 6, 1, 0], [6, 2, 0, 1]]), (DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'Type', [[0, 0, 0, 0], [0, 0, 0, 0]]), (DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'Meter_Value', [['5236', '58', '1770', '5237'], ['57', '1732', '5238', '1769']])]
在[]中:
tuples=[] for my_tuples in results: (dates,key,data)=my_tuples for i in range(len(dates)): print(dates[i],key,data[i]) tuples.append((dates[i],key,data[i])) for a_tuple in tuples: print(a_tuple)
Index(['Report_Time', 'Subsystem', 'A', 'B', 'Type', 'Meter_Value'], dtype='object') (Timestamp('2021-02-04 11:03:34'), 'Subsystem', [0, 0, 0, 0]) (Timestamp('2021-02-04 11:03:35'), 'Subsystem', [0, 0, 0, 0]) (Timestamp('2021-02-04 11:03:34'), 'A', [21, 6, 18, 21]) (Timestamp('2021-02-04 11:03:35'), 'A', [6, 19, 21, 18]) (Timestamp('2021-02-04 11:03:34'), 'B', [0, 6, 1, 0]) (Timestamp('2021-02-04 11:03:35'), 'B', [6, 2, 0, 1]) (Timestamp('2021-02-04 11:03:34'), 'Type', [0, 0, 0, 0]) (Timestamp('2021-02-04 11:03:35'), 'Type', [0, 0, 0, 0]) (Timestamp('2021-02-04 11:03:34'), 'Meter_Value', ['5236', '58', '1770', '5237']) (Timestamp('2021-02-04 11:03:35'), 'Meter_Value', ['57', '1732', '5238', '1769'])
IIUC,您可以
group
在Report_Time
上的数据帧,然后对于与唯一Timestamp
相对应的每个组,您可以创建从Timestamp
到从A
、B
、Type
和Meter_Value
列dict
中获得的所需数组的映射:要访问与唯一
Timestamp
对应的数组,可以使用字典查找:检查时间戳是否更改,如果更改,则开始收集变量中的行,然后使用
as.vector(t(Dataframevariable))
对它们进行向量化。 thispost可能会对您有所帮助输出:
在[]中:
输出:
相关问题 更多 >
编程相关推荐