Pandas中的groupBy相关性

2024-10-08 18:28:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下熊猫数据帧

 code    tank     var     nozzle_1    nozzle_2     nozzle_3    nozzle_tank
 123     1        23.34   12.23       54.56        12.22       11 
 123     1        22.32   11.32       7.89         3.45        12 
 123     1        21.22   19.93       5.54         5.66        12
 123     1        21.34   12.23       54.56        22.22       14
 123     1        32.32   13.32       4.89         32.45       34
 123     1        32.22   29.93       23.54        23.66       33
 123     2        23.34   12.23       54.56        12.22       21
 123     2        22.32   11.32       7.89         3.45        22
 123     2        21.22   19.93       5.54         5.66        21
 123     2        21.34   12.23       54.56        22.22       21
 123     2        32.32   13.32       4.89         32.45       22
 123     2        32.22   29.93       23.54        23.66       21  

我想计算喷嘴1、喷嘴2、喷嘴3和喷嘴4与储罐上方var柱分组的相关性

我想要的数据帧是

 code   tank    nozzle_1    nozzle_2    nozzle_3    nozzle_4    
 123    1       0.08        0.01        0.02        0.01
 123    2       0.07        0.01        0.02        0.02

我在熊猫馆做跟踪器

cols= df.columns[df.columns.str.contains(pat='nozzle_\d+$', regex=True)] 
cols= np.array(cols)
var_col = 'var'
tank = 'tank'
def corrVar(df, cols, var_col, tank):
        final_df = pd.DataFrame()
        for col in nozzles_to_scale:
            corrs = (df[[col, tank]].groupby(tank).corrwith(df.var_col ).reset_index())
            final_df = final_df.join(corrs)
        return final_df

但它似乎不起作用,我们怎么能在熊猫身上做到呢?你知道吗

    test =  corrVar(df, cols, var, tank)

Tags: columns数据dfvarcodecolfinalcols
1条回答
网友
1楼 · 发布于 2024-10-08 18:28:51

您可以使用:

cols = df.columns[df.columns.str.contains(pat='nozzle_\d+$', regex=True)] 
var_col = 'var'
tank = 'tank'
def corrVar(df, cols, var_col, tank):
        final_df = [df[[col, tank]].groupby(tank).corrwith(df[var_col]) for col in cols]
        return pd.concat(final_df, axis=1)

print (corrVar(df, cols, var_col, tank))
      nozzle_1  nozzle_2  nozzle_3
tank                              
1     0.501164 -0.309435  0.761017
2     0.501164 -0.309435  0.761017

编辑:每个组的每个N值的相关解决方案:

N = 3
g = df.groupby('tank').cumcount() // N

cols = df.columns[df.columns.str.contains(pat='nozzle_\d+$', regex=True)] 
var_col = 'var'
tank = 'tank'
code = 'code'
def corrVar(df, cols, var_col, tank, g):
        #https://stackoverflow.com/a/48570300
        final_df = [df.groupby([g, tank]).apply(lambda x: x[col].corr(x[var_col])) 
                            for col in cols]
        return pd.concat(final_df, axis=1, keys=cols)

print (corrVar(df, cols, var_col, tank, g))
        nozzle_1  nozzle_2  nozzle_3
  tank                              
0 1    -0.826376  0.876202  0.703793
  2    -0.826376  0.876202  0.703793
1 1     0.540176 -0.931286  0.614626
  2     0.540176 -0.931286  0.614626

测试组:

print (df.assign(groups=g))
    code  tank    var  nozzle_1  nozzle_2  nozzle_3  nozzle_tank  groups
0    123     1  23.34     12.23     54.56     12.22           11       0
1    123     1  22.32     11.32      7.89      3.45           12       0
2    123     1  21.22     19.93      5.54      5.66           12       0
3    123     1  21.34     12.23     54.56     22.22           14       1
4    123     1  32.32     13.32      4.89     32.45           34       1
5    123     1  32.22     29.93     23.54     23.66           33       1
6    123     2  23.34     12.23     54.56     12.22           21       0
7    123     2  22.32     11.32      7.89      3.45           22       0
8    123     2  21.22     19.93      5.54      5.66           21       0
9    123     2  21.34     12.23     54.56     22.22           21       1
10   123     2  32.32     13.32      4.89     32.45           22       1
11   123     2  32.22     29.93     23.54     23.66           21       1

编辑:

函数应为一行:

def corrVar(df, cols, var_col, tank, g):
        return pd.concat([df.groupby([g, tank]).apply(lambda x: x[col].corr(x[var_col])) 
                          for col in cols], axis=1, keys=cols)

相关问题 更多 >

    热门问题