如何通过使用两列合并组以获得这两列中键的唯一组合来进行分组

2024-06-28 19:55:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在努力解决这个问题,而不是编写一些可怕的循环来检查groupby之后的组并将它们合并在一起。我觉得他们必须是一种我不知道的方式

我想做的是将一个数据帧按其两列分组,但有时会有这些组的组合,其中键被翻转(即,[key1,key2]将有一个组,[key2,key1]将有一个组。我实际上想将这些组合中的组合并到一个组中

事后可以在循环中执行。我也尝试过使用以下方法:

unique combinations of values in selected columns in pandas data frame and count

但无法让它工作

以下是我的df示例:

            Ves-1 type          Ves-2 type    Duration
0                cargo                 tug  898.559993
1     fishing_trawling                 tug  898.559992
2   fishing_transiting                 tug  898.559993
3   fishing_transiting                 tug  898.559993
4                  tug                 tug  898.559992
5                cargo                 tug  898.560002
6                cargo                 tug  898.560002
7            passenger                 tug  907.200008
8             pleasure                 tug  898.560003
9                cargo                 tug  898.559993
10               cargo                 tug  898.559993
11               cargo  fishing_transiting  898.560002
12               cargo  fishing_transiting  898.559993
13               cargo  fishing_transiting  898.560002
14                 tug  fishing_transiting  898.560003
15               cargo  fishing_transiting  907.200008
16               cargo  fishing_transiting  907.200008
17                 tug  fishing_transiting  898.560002
18               cargo  fishing_transiting  898.560002
19  fishing_transiting  fishing_transiting  898.559993

如果我只是使用两个Ves列进行简单的分组:

>>> test.groupby(['Ves-1 type','Ves-2 type'])['Duration'].agg(list)
Ves-1 type          Ves-2 type
cargo               fishing_transiting    [898.560002, 898.5599930000001, 898.560002, 90...
                    tug                   [898.5599930000001, 898.560002, 898.560002, 89...
fishing_transiting  fishing_transiting                                  [898.5599930000001]
                    tug                              [898.5599930000001, 898.5599930000001]
fishing_trawling    tug                                                 [898.5599920000001]
passenger           tug                                                        [907.200008]
pleasure            tug                                                        [898.560003]
tug                 fishing_transiting                             [898.560003, 898.560002]
                    tug                                                 [898.5599920000001]

问题是现在我有一个钓鱼/拖船组合和一个拖船/钓鱼/拖船组合…有没有办法把这些组合合并在一起

编辑-我尝试过的另一种解决方法很有效,但想知道是否有一种方法可以在groupby中处理此问题:

>>> test['key'] = list(zip(test['Ves-1 type'].values, test['Ves-2 type'].values))
>>> test['key'] = test['key'].apply(sorted).astype(str)
>>> test.groupby('key')['Duration'].agg(list)
key
['cargo', 'fishing_transiting']                 [898.560002, 898.5599930000001, 898.560002, 90...
['cargo', 'tug']                                [898.5599930000001, 898.560002, 898.560002, 89...
['fishing_transiting', 'fishing_transiting']                                  [898.5599930000001]
['fishing_transiting', 'tug']                   [898.5599930000001, 898.5599930000001, 898.560...
['fishing_trawling', 'tug']                                                   [898.5599920000001]
['passenger', 'tug']                                                                 [907.200008]
['pleasure', 'tug']                                                                  [898.560003]
['tug', 'tug']                                                                [898.5599920000001]

Tags: 方法keytesttypevaluesdurationgroupbycargo
1条回答
网友
1楼 · 发布于 2024-06-28 19:55:23

让我们沿着axis=1对列Ves-1 typeVes-2 type中的值进行排序,然后使用list对这些排序列上的数据帧和agg{}进行排序:

c = ['Ves-1 type', 'Ves-2 type']
df.groupby(np.sort(df[c], axis=1).T.tolist())['Duration'].agg(list)

cargo               fishing_transiting    [898.5600019999999, 898.559993, 898.5600019999...
                    tug                   [898.559993, 898.5600019999999, 898.5600019999...
fishing_transiting  fishing_transiting                                         [898.559993]
                    tug                   [898.559993, 898.559993, 898.5600029999999, 89...
fishing_trawling    tug                                                        [898.559992]
passenger           tug                                                        [907.200008]
pleasure            tug                                                 [898.5600029999999]
tug                 tug                                                        [898.559992]
Name: Duration, dtype: object

相关问题 更多 >