根据数据帧中是否存在行(按组ID)添加行?

2024-09-21 03:19:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个数据集:

g_id    event   time_left  home away
1       "TIP"   00:12:00   8    6
1       "SHOT"  00:11:48   8    6
1       "MISS"  00:11:20   8    6
1       "TOV"   00:11:15   8    6
1       "SHOT"  00:10:40   8    6
2       "REB"   00:11:48   7    3
2       "FOUL"  00:11:35   7    3
2       "FT"    00:11:33   7    3
2       "FT"    00:11:31   7    3
3       "TIP"   00:12:00   5    1
3       "MISS"  00:11:43   5    1
3       "REB"   00:11:42   5    1
3       "SHOT"  00:11:27   5    1
3       "TOV"   00:11:04   5    1 
4       "SHOT"  00:11:39   9    4
4       "MISS"  00:11:17   9    4
4       "REB"   00:11:16   9    4
4       "SHOT"  00:10:58   9    4

我注意到我的问题有点类似于this one in MySQL,但我想知道这是否也可以在熊猫身上实现。正如您可能已经注意到的,数据是按“g_id”分组的,一些序列以“TIP”开头,而另一些序列则不以“TIP”开头。我想做的是按“g_id”进行,如果“g_id”不是以event='TIP'开头,则在该列中插入一行包含'TIP',在'time_left'列中插入'00:12:00',并将第一行中的'home'和'away'列结转。我该怎么做?真正的数据集有更多的列,但我基本上只需要插入一个新行,其中一些列值与前面的行相同,一些被分配了新值


Tags: 数据eventidhometime序列leftft
2条回答

您可以迭代组并检查第一个事件是否为TIP,然后使用series.shiftpd.concat添加第一行并将最后一行追加回:

l = [pd.concat((g.shift().fillna({'event':'"TIP"','time_left':'00:12:00'}).bfill(),
                                                            g.iloc[[-1]])) 
   if 'TIP' not in g['event'].iloc[0] else g for _,g in df.groupby('g_id')]

out = pd.concat(l,ignore_index=True)
print(out)

   g_id   event time_left home away
0     1   "TIP"  00:12:00    8    6
1     1  "SHOT"  00:11:48    8    6
2     1  "MISS"  00:11:20    8    6
3     1   "TOV"  00:11:15    8    6
4     1  "SHOT"  00:10:40    8    6
5     2   "TIP"  00:12:00    7    3
6     2   "REB"  00:11:48    7    3
7     2  "FOUL"  00:11:35    7    3
8     2    "FT"  00:11:33    7    3
9     2    "FT"  00:11:31    7    3
10    3   "TIP"  00:12:00    5    1
11    3  "MISS"  00:11:43    5    1
12    3   "REB"  00:11:42    5    1
13    3  "SHOT"  00:11:27    5    1
14    3   "TOV"  00:11:04    5    1
15    4   "TIP"  00:12:00    9    4
16    4  "SHOT"  00:11:39    9    4
17    4  "MISS"  00:11:17    9    4
18    4   "REB"  00:11:16    9    4
19    4  "SHOT"  00:10:58    9    4

稍长一点的解决方案。您可以通过

    g_ids = df['g_id'].unique()

这个示例将返回一个数组[1,2,3,4]

    for g_id in g_ids:
        events = df[df['g_id'] == g_id]['event']
        if 'TIP' not in events:
            insert_index = len(df.index)
            copy_row_index = df.iloc[df['g_id'].ne(g_id).idxmax()]
            df.loc[insert_index] = df[df['g_id'] == g_id].iloc[0]
            df.loc[insert_index]['event'] == 'TIP'
    df.sort_values(by=['g_id'], inplace=True)

相关问题 更多 >

    热门问题