查找每行中特定列的最大值，并将相应列的值与另一列的值相匹配

ID Hour Cus1 Cus2 Cus3 Cus4 Cus1_Score Cus2_Score Cus3_Score Cus4_Score Match1 Match2 1 11 8 10 11 14 0.62 0.59 0.59 0.54 NO YES 2 13 15 16 18 13 0.57 0.57 0.57 0.57 YES NO 3 16 09 14 16 12 0.67 0.54 0.48 0.34 NO NO 4 08 11 08 12 17 0.58 0.55 0.43 0.25 NO YES

2条回答

网友

1楼 · 编辑于 2024-09-28 01:27:35

希望我得到了正确的条件；下面的计算依赖于赋值前的索引；因此，我创建了一个长数据帧，创建了长数据帧的条件，将大小减少到唯一索引，并将结果分配给原始数据帧：

创建一个长数据框来计算条件（为了方便起见，我在这里使用pyjanitor pivot_longer对其进行重塑；您可以在不使用PyGatitor的情况下通过几个步骤来完成这项工作）：

# pip install git+https://github.com/pyjanitor-devs/pyjanitor.git
import janitor as jn
import pandas as pd
res = (df.pivot_longer(['ID', 'Hour'], 
                       names_pattern= [r'.+\d$', r'.+Score$'], 
                       names_to = ['cus', 'score'], 
                       ignore_index = False)
       )

print(res)

   ID  Hour  cus  score
0   1    11    8   0.62
1   2    13   15   0.57
2   3    16    9   0.67
3   4     8   11   0.58
0   1    11   10   0.59
1   2    13   16   0.57
2   3    16   14   0.54
3   4     8    8   0.55
0   1    11   11   0.59
1   2    13   18   0.57
2   3    16   16   0.48
3   4     8   12   0.43
0   1    11   14   0.54
1   2    13   13   0.57
2   3    16   12   0.34
3   4     8   17   0.25

为match1创建条件：

# get booleans for max and no duplicates
res = res.sort_index()
max1 = res.groupby(level=0).score.transform('max')
# max without duplicates
cond1 = res.score.eq(max1).groupby(level=0).sum().eq(1)
cond1 = cond1 & df.Hour.eq(df.Cus1)
# max with duplicates
cond2 = res.score.eq(max1).groupby(level=0).transform('sum').gt(1) 
cond2 &= res.Hour.eq(res.cus)
cond2 = cond2.groupby(level=0).any()
df['match1'] = np.where(cond1 | cond2, 'YES', 'NO')

匹配2：


second_largest = (res.sort_values('score', ascending=False)
                     .groupby(level=0, sort = False)
                     .score
                     .transform('nth', 1)
                  )
second_largest = second_largest.sort_index()
# second largest without duplicates
cond_1 = res.score.eq(second_largest).groupby(level=0).sum().eq(1)
cond_1 &= df.Hour.eq(df.Cus2)
# second largest with duplicates
cond_2 = res.score.eq(second_largest).groupby(level=0).transform('sum').gt(1)
cond_2 &= res.Hour.eq(res.cus)
cond_2 = cond_2.groupby(level=0).any()
df['match2'] = np.where(cond_1 | cond_2, 'YES', 'NO')

df

      ID  Hour  Cus1  Cus2  Cus3  Cus4  Cus1_Score  Cus2_Score  Cus3_Score  Cus4_Score match1 match2
0   1    11     8    10    11    14        0.62        0.59        0.59        0.54     NO    YES
1   2    13    15    16    18    13        0.57        0.57        0.57        0.57    YES    YES
2   3    16     9    14    16    12        0.67        0.54        0.48        0.34     NO     NO
3   4     8    11     8    12    17        0.58        0.55        0.43        0.25     NO    YES

网友

2楼 · 编辑于 2024-09-28 01:27:35

您需要为行操作编写一个函数，其中包含您解释的所有条件。然后，您可以使用.iterrows、for或.apply方法将此函数应用于行，您可以找到它们here

相关问题更多 >

编程相关推荐

热门问题

热门文章