如何使用类层次结构列表比较两列中哪个类更重要

2024-09-21 04:47:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个从大到小的班级列表:

classes = ['A','B','C','D']

以及具有两列的数据框:

 Segmentation 2019 Segmentation  2020
         B              A
         B              A 
         A              B         
         C              C         
         B              D

在比较哪个类更大(如果相等-保持相等)后,如何使第三列具有类值


Tags: 数据列表classessegmentation班级类值
2条回答

您可以从“类”列表创建一个字典,其中key是类,value是索引(它用作秩,因为列表从大到小)

然后您可以创建两个列组列,其中包含列组(0到n0更大)。最后,比较等级,取等级较高的等级(即值较小的等级)

classes = ['A','B','C','D']
classes_dict = {val: index for index,val in enumerate(classes)}
df['Seg 2019 Rank'] = df['Seg 2019'].map(classes_dict)
df['Seg 2020 Rank'] = df['Seg 2020'].map(classes_dict)
df['greater'] = df.apply(lambda x: x['Seg 2019'] if x['Seg 2019 Rank'] < x['Seg 2020 Rank'] else x['Seg 2020'] if x['Seg 2020 Rank'] < x['Seg 2019 Rank'] else "equal" , axis=1)

输出:

Seg 2019    Seg 2020    Seg 2019 Rank   Seg 2020 Rank   greater
    B   A   1   0   A
    B   A   1   0   A
    A   B   0   1   A
    C   C   2   2   equal
    B   D   1   3   B

如果您添加了一个新类(VIP),您只需将其添加到a之前的列表中,它将被视为一个更大的类

在两列之间使用ordered categoricals^{}表示getEqual或minimal的解决方案:

print (df)

  Segmentation 2019 Segmentation 2020
0                 B               VIP
1                 B                 A
2                 A                 B
3                 C                 C
4                 B                 D

classes = ['VIP','A','B','C','D']

df['Segmentation 2020'] = pd.Categorical(df['Segmentation 2020'], 
                                         ordered=True,
                                         categories=classes)
df['Segmentation 2019'] = pd.Categorical(df['Segmentation 2019'], 
                                         ordered=True, 
                                         categories=classes)

mask = df['Segmentation 2019'].eq(df['Segmentation 2020'])
s = df[['Segmentation 2019','Segmentation 2020']].stack().min(level=0)
df['new'] = np.where(mask, 'Equal', s)
print (df)
  Segmentation 2019 Segmentation 2020    new
0                 B               VIP    VIP
1                 B                 A      A
2                 A                 B      A
3                 C                 C  Equal
4                 B                 D      B

或使用^{}的解决方案:

classes = ['VIP','A','B','C','D']

df['Segmentation 2020'] = pd.Categorical(df['Segmentation 2020'], 
                                         ordered=True,
                                         categories=classes)
df['Segmentation 2019'] = pd.Categorical(df['Segmentation 2019'], 
                                         ordered=True, 
                                         categories=classes)

mask1 = df['Segmentation 2019'].lt(df['Segmentation 2020'])
mask2 = df['Segmentation 2019'].gt(df['Segmentation 2020'])

df['classes'] = np.select([mask1, mask2], 
                          [df['Segmentation 2019'], df['Segmentation 2020']], 
                          default='Equal')
print (df)
  Segmentation 2019 Segmentation 2020 classes
0                 B               VIP     VIP
1                 B                 A       A
2                 A                 B       A
3                 C                 C   Equal
4                 B                 D       B

相关问题 更多 >

    热门问题