数据帧转换和合并行

2024-09-25 12:26:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个来自机器的timeseries数据帧,其中的值是不同的标记和一些不同格式的标记

| datetime            | tagid  | value |
|---------------------|--------|-------|
| 08-04-2021 11:30:58 | BNO_01 | 12849 |
| 08-04-2021 11:30:58 | BNO_02 | 12597 |
| 08-04-2021 11:30:58 | BNO_03 | 14390 |
| 08-04-2021 11:30:58 | MDL_01 | 21328 |
| 08-04-2021 11:30:58 | MDL_02 | 22304 |
| 08-04-2021 11:30:58 | SEQ_01 | 12340 |
| 08-04-2021 11:30:58 | SEQ_02 | 13622 |
| 08-04-2021 11:30:58 | STA    | 724   |
| 08-04-2021 11:30:58 | STO    | 735   |

  1. 转换标签ID BNO_01、BNO_02、BNO_03、MDL_01、MDL_02、SEQ_01、SEQ_02 使用df['tagid']=df['tagid']。应用(λx:chr(圆形(x/256))+chr(x%256)),但仅适用于上述标记行

  2. 删除行MDL_01、MDL_02、BNO_01、BNO_02、BNO_03并将文本合并为BNO行

  3. 删除SEQ_01、SEQ_02行,并将文本合并为SEQ行

示例:
MDL_01=21328-->;'SP',
MDL_02=22304-->;'W'
BNO_01=12849-->;'21'
BNO_02=12597-->;'15'
BNO_03=14390-->;'86'

BNO='spw211586'

所需数据帧

| datetime            | tagid | value      |
|---------------------|-------|------------|
| 08-04-2021 11:30:58 | BNO   | SPW 211586 |
| 08-04-2021 11:30:58 | SEQ   | 0456       |
| 08-04-2021 11:30:58 | STA   | 724        |
| 08-04-2021 11:30:58 | STO   | 735        |

Tags: 数据标记文本gt机器dfdatetimevalue
2条回答

首先将其tagid列包含_value列值更改为char

然后从tagid列中删除_

df['value'].update(df.loc[df['tagid'].str.contains('_'), 'value'].apply(lambda x: chr(round(x / 256)) + chr(x % 256)))
df['tagid'] = df['tagid'].apply(lambda x: x.split('_')[0])
# print(df)

              datetime tagid value
0  08-04-2021 11:30:58   BNO    21
1  08-04-2021 11:30:58   BNO    15
2  08-04-2021 11:30:58   BNO    86
3  08-04-2021 11:30:58   MDL    SP
4  08-04-2021 11:30:58   MDL    W 
5  08-04-2021 11:30:58   SEQ    04
6  08-04-2021 11:30:58   SEQ    56
7  08-04-2021 11:30:58   STA   724
8  08-04-2021 11:30:58   STO   735

此外,groupby(){}和tagid列,并用''连接每个组中的value

df_ = df.groupby(['datetime','tagid']).apply(lambda x: ''.join(map(str, x['value'].tolist()))).reset_index().rename({0: 'value'}, axis=1)
print(df_)

              datetime tagid   value
0  08-04-2021 11:30:58   BNO  211586
1  08-04-2021 11:30:58   MDL    SPW 
2  08-04-2021 11:30:58   SEQ    0456
3  08-04-2021 11:30:58   STA     724
4  08-04-2021 11:30:58   STO     735

最后,将BNOMDL行合并并删除MDL

df_.loc[df_['tagid'] == 'BNO', 'value'] = df_.loc[df_['tagid'] == 'MDL', 'value'].iloc[0] + ' ' + df_.loc[df_['tagid'] == 'BNO', 'value'].iloc[0]
df_ = df_[~(df_['tagid'] == 'MDL')]
# print(df_)

              datetime tagid        value
0  08-04-2021 11:30:58   BNO  SPW  211586
2  08-04-2021 11:30:58   SEQ         0456
3  08-04-2021 11:30:58   STA          724
4  08-04-2021 11:30:58   STO          735

其思想是通过^{}过滤值首先通过^{}过滤,通过lambda function处理行,然后排序,并在将MDL替换为BNO聚合值后使用join,最后使用~对反向掩码使用^{}与原始过滤行不匹配条件:

此解决方案的优点是不更改不匹配的值,因此,如果重复项(如2次STA)且也未将values更改为字符串,则不会进行聚合

df['datetime'] = pd.to_datetime(df['datetime'])

vals = ['BNO','MDL','SEQ']
mask = df['tagid'].str.startswith(tuple(vals))

df1 = df[mask].copy()
df1['value'] = df1['value'].apply(lambda x: chr(round(x / 256)) + chr(x % 256))
df1['tagid'] = df1['tagid'].str.split('_').str[0]

df1 = (df1.sort_values('tagid', ascending=False)
          .replace({'MDL':'BNO'})
          .groupby(['datetime','tagid'])['value']
          .agg(''.join)
          .reset_index())


df = pd.concat([df1, df[~mask]], ignore_index=True)
print (df)
             datetime tagid       value
0 2021-08-04 11:30:58   BNO  SPW 211586
1 2021-08-04 11:30:58   SEQ        0456
2 2021-08-04 11:30:58   STA         724
3 2021-08-04 11:30:58   STO         735

相关问题 更多 >