如何将分类数据放入垃圾箱

2024-06-01 08:32:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下分类数据:

['Self employed', 'Government Dependent',
 'Formally employed Private', 'Informally employed',
 'Formally employed Government', 'Farming and Fishing',
 'Remittance Dependent', 'Other Income',
 'Don't Know/Refuse to answer', 'No Income']

如何将它们放入垃圾箱中,以便:

 ['Government Dependent','Formally employed Government','Formally 
  employed Private'] = 0

 ['Remittance Dependent', 'Informally employed','Self employed','Other Income'] = 1
 ['Dont Know/Refuse to answer', 'No Income','Farming and Fishing'] = 2

我已经知道把数字数据放进分类箱……反过来可以吗

TRAIN = pd.read_csv("Train_v2.csv")
TRAIN['job_type'].unique()
output:
array(['Self employed', 'Government Dependent',
       'Formally employed Private', 'Informally employed',
       'Formally employed Government', 'Farming and Fishing',
       'Remittance Dependent', 'Other Income',
       'Dont Know/Refuse to answer', 'No Income'], dtype=object)

Tags: andselfprivateotherknowincomedependentfishing
2条回答

首先创建字典,通过交换进行更改,最后使用^{}

a = ['Self employed', 'Government Dependent',
       'Formally employed Private', 'Informally employed',
       'Formally employed Government', 'Farming and Fishing',
       'Remittance Dependent', 'Other Income',
       'Dont Know/Refuse to answer', 'No Income']

TRAIN = pd.DataFrame({'job_type':a})

#add another groups to dict
d = {0: ['Government Dependent','Formally employed Government','Formally employed Private'],
     1: ['Remittance Dependent', 'Informally employed'],
     2: ["Don't Know/Refuse to answer", 'No Income']}

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
TRAIN['new'] = TRAIN['job_type'].map(d1)
print (TRAIN)
                       job_type  new
0                 Self employed  NaN
1          Government Dependent  0.0
2     Formally employed Private  0.0
3           Informally employed  1.0
4  Formally employed Government  0.0
5           Farming and Fishing  NaN
6          Remittance Dependent  1.0
7                  Other Income  NaN
8    Dont Know/Refuse to answer  NaN
9                     No Income  2.0

如果只有01NaN的输出也在^{}工作,但如果有许多组,则其复杂且缓慢:

m1 = TRAIN['job_type'].isin(['Government Dependent','Formally employed Government','Formally employed Private'])
m2 = TRAIN['job_type'].isin(['Remittance Dependent', 'Informally employed'])
m3 = TRAIN['job_type'].isin(["Don't Know/Refuse to answer", 'No Income'])
TRAIN['new'] = np.select([m1, m2, m3], [0, 1, 2], np.nan)

如果np.where不属于类别0或类别1或类别2,则可以执行np.where并使np.nan成为值。有关np.where{a1}的更多资源:

list_0 = ['Government Dependent','Formally employed Government','Formally employed Private']
list_1 = ['Remittance Dependent', 'Informally employed']
list_2 = ['Don't Know/Refuse to answer', 'No Income']
TRAIN['job_type_bin'] = np.where(TRAIN['job_type'].isin(list_0), 0, np.nan)
TRAIN['job_type_bin'] = np.where(TRAIN['job_type'].isin(list_1), 1, np.nan)
TRAIN['job_type_bin'] = np.where(TRAIN['job_type'].isin(list_1), 2, np.nan)

相关问题 更多 >