用“OTHER”python重命名不太频繁的类别

print(df) Employee_number Jobrol 0 1 Sales Executive 1 2 Research Scientist 2 3 Laboratory Technician 3 4 Sales Executive 4 5 Research Scientist 5 6 Laboratory Technician 6 7 Sales Executive 7 8 Research Scientist 8 9 Laboratory Technician 9 10 Sales Executive 10 11 Research Scientist 11 12 Laboratory Technician 12 13 Sales Executive 13 14 Research Scientist 14 15 Laboratory Technician 15 16 Sales Executive 16 17 Research Scientist 17 18 Research Scientist 18 19 Manager 19 20 Human Resources 20 21 Sales Executive valCount = df['Jobrol'].value_counts() valCount Sales Executive 7 Research Scientist 7 Laboratory Technician 5 Manager 1 Human Resources 1

2条回答

网友

1楼 · 编辑于 2024-10-01 02:18:25

将^{}与^{}一起使用：

need = df['Jobrol'].value_counts().index[:3]
df['Jobrol'] = np.where(df['Jobrol'].isin(need), df['Jobrol'], 'OTHER')

valCount = df['Jobrol'].value_counts()
print (valCount)
Research Scientist       7
Sales Executive          7
Laboratory Technician    5
OTHER                    2
Name: Jobrol, dtype: int64

另一种解决方案：

^{pr2}$

网友

2楼 · 编辑于 2024-10-01 02:18:25

将序列转换为分类，提取计数不在前3位的类别，添加新类别，例如'Other'，然后替换先前计算的类别：

df['Jobrol'] = df['Jobrol'].astype('category')

others = df['Jobrol'].value_counts().index[3:]
label = 'Other'

df['Jobrol'] = df['Jobrol'].cat.add_categories([label])
df['Jobrol'] = df['Jobrol'].replace(others, label)

注意：通过df['Jobrol'].cat.rename_categories(dict.fromkeys(others, label))重命名来组合类别是很有诱惑力的，但这不起作用，因为这意味着有多个相同标签的类别，这是不可能的。在

上述溶液可根据计数进行过滤。例如，要只包含计数为1的类别，可以将others定义为：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章