Pandas分组合并

2024-09-30 06:29:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧leftright,我想根据df1中的分组合并它们

df1:

ID              cumul_growth_perc
Nioz-TC-09-A1R  0
Nioz-TC-09-A1R  2.99881756777804
Nioz-TC-09-A1R  90.1974001442841
Nioz-TC-09-A1R  92.7010664317585
Nioz-TC-09-A1R  95.4937993952028
Nioz-TC-09-A1R  97.7300790074048
Nioz-TC-09-A1R  100
Nioz-TC-09-A2R  0
Nioz-TC-09-A2R  2.1989297984251
Nioz-TC-09-A2R  4.25561486642024
Nioz-TC-09-A2R  82.2910739802899
Nioz-TC-09-A2R  93.276493352502
Nioz-TC-09-A2R  95.5072381936874
Nioz-TC-09-A2R  97.5983443147713
Nioz-TC-09-A2R  100
df2:

day cumul_growth_perc
32  0.233297611918821
33  0.466595223837642
34  0.699892835756464
35  0.933190447675285
36  1.16648805959411
37  1.39978567151293
46  3.54027808151455
47  3.78173847397982
48  4.02319886644508
335 92.4313101347799
336 92.6888317371006
337 92.9463533394213
338 93.203874941742
339 93.4613965440627
340 93.7189181463834
361 99.0468989121531
362 99.2851741841149
363 99.5234494560766
364 99.7617247280384
365 100

cumul_growth_perc的范围为0-100,但此处缩短以供演示。我想合并此列上的两个数据帧,并且df1df2中的值不完全匹配。 此外,在执行匹配之前,应按ID列对df1进行分组。据我所知,pandasmerge_asof专门有by=关键字来实现这一点。但是由于我在df2中没有ID列,因此操作失败df2对于{}的所有组都是相同的

以下是我使用的: pd.merge_asof(df1, df2, on='cumul_growth_perc', left_by='ID', direction='nearest')

正如所料,它告诉我right_by is missing。如何仍然执行“分组合并”?我可以通过为df1.ID中的每个唯一值添加相同的值来扩展df2,但这感觉有点不对劲

编辑:

预期产出:

                ID  cumul_growth_perc  day
0   Nioz-TC-09-A1R           0.000000   32
1   Nioz-TC-09-A1R           2.998818   46
2   Nioz-TC-09-A1R          90.197400  335
3   Nioz-TC-09-A1R          92.701066  336
4   Nioz-TC-09-A1R          95.493799  340
5   Nioz-TC-09-A1R          97.730079  361
6   Nioz-TC-09-A1R         100.000000  365
7   Nioz-TC-09-A2R           0.000000   32
8   Nioz-TC-09-A2R           2.198930   37
9   Nioz-TC-09-A2R           4.255615   48
10  Nioz-TC-09-A2R          82.291074  335
11  Nioz-TC-09-A2R          93.276493  338
12  Nioz-TC-09-A2R          95.507238  340
13  Nioz-TC-09-A2R          97.598344  361
14  Nioz-TC-09-A2R         100.000000  365

这意味着在执行合并之前,我要按df1.ID分组。我通过“重复”df2,为df1的每个ID增加了一个ID列,从而使其工作:

for i, name in enumerate(df1.Shell_ID.unique()):
    if i==0:
        df2_long = df2.copy()
        df2_long['ID'] = name
    else:
        temp = df2.copy()
        temp['ID'] = name
        df2_long = df2_long.append(temp)

然后两个数据帧都按cumul_growth_perc排序,我将它们与pd.merge_asof(df1, df2_long, on='cumul_growth_perc', by='ID', direction='nearest')合并

但感觉有一个更简单的解决方案


Tags: 数据nameidbylongtcdf1df2
2条回答

使用tolerancedirection参数,可以定义值之间的接近程度。正如您所看到的value 2.998818 ID{},因为例如,在3.0间隔中没有来自第二个df的关闭值

df = pd.DataFrame({
    "cumul_growth_perc": [2.99881756777804,90.1974001442841,92.7010664317585],
    'day':['one','two','three']
})
print(df)
   cumul_growth_perc    day
0           2.998818    one
1          90.197400    two
2          92.701066  three


df2= pd.DataFrame({
    "cumul_growth_perc": [92.9463533394213, 93.203874941742, 84.00],
    'ID':['first','second','3rd']
}).sort_values(by='cumul_growth_perc')

print(df2)
   cumul_growth_perc      ID
2          84.000000     3rd
0          92.946353   first
1          93.203875  second

res = pd.merge_asof(df,df2,on='cumul_growth_perc',tolerance=3.0,direction='nearest')

print(res)
   cumul_growth_perc    day     ID
0           2.998818    one    NaN
1          90.197400    two  first
2          92.701066  three  first

使用^{}对数据帧df1df2进行排序cumul_growth_perc,然后对排序后的数据帧执行^{}

d1 = df1.sort_values('cumul_growth_perc')
d2 = df2.sort_values('cumul_growth_perc')

df = pd.merge_asof(d1, d2, on='cumul_growth_perc', direction='nearest').sort_values('ID')

结果:

                ID  cumul_growth_perc  day
0   Nioz-TC-09-A1R           0.000000   32
1   Nioz-TC-09-A1R           2.998818   46
2   Nioz-TC-09-A1R          90.197400  335
3   Nioz-TC-09-A1R          92.701066  336
4   Nioz-TC-09-A1R          95.493799  340
5   Nioz-TC-09-A1R          97.730079  361
6   Nioz-TC-09-A1R         100.000000  365
7   Nioz-TC-09-A2R           0.000000   32
8   Nioz-TC-09-A2R           2.198930   37
9   Nioz-TC-09-A2R           4.255615   48
10  Nioz-TC-09-A2R          82.291074  335
11  Nioz-TC-09-A2R          93.276493  338
12  Nioz-TC-09-A2R          95.507238  340
13  Nioz-TC-09-A2R          97.598344  361
14  Nioz-TC-09-A2R         100.000000  365

相关问题 更多 >

    热门问题