import pandas as pd
import numpy as np
df = pd.DataFrame([['CASUAL DINING','Malwani, Goan, North Indian'],
['CASUAL DINING,BAR','Malwani, Goan, North Indian'],
['CASUAL DINING','Asian, Modern Indian, Japanese'],
['QUICK BITES',np.nan],
['CAFE','Bar Food'],
['CASUAL DINING', 'South Indian, North Indian']], columns = ['TITLE','CUISINES'])
输出:
print (df)
TITLE CUISINES
0 CASUAL DINING Malwani, Goan, North Indian
1 CASUAL DINING,BAR Malwani, Goan, North Indian
2 CASUAL DINING Asian, Modern Indian, Japanese
3 QUICK BITES Tibetan, Chinese
4 CAFE Bar Food
5 CASUAL DINING South Indian, North Indian
创建唯一值字典:
title_unq = list(df['TITLE'].unique())
title_dict = {}
for idx, value in enumerate(title_unq):
title_dict[value] = idx
cuisines_unq = list(df['CUISINES'].unique())
cuisines_dict = {}
for idx, value in enumerate(cuisines_unq):
cuisines_dict[value] = idx
好吧,有人可能会说,一个热编码/将分类列转换为数字并不是“不必要”地增加列的数量……事实上,这将是真正将所有不同类别分离为数字值的必要条件
但是,如果您想保持列的数量,您可以做一些事情,获取列中的所有唯一值并创建一个字典。然后使用字典将它们映射回列中。它还将处理您的
nan
,但您必须决定最终要如何处理这些内容:给出:
输出:
创建唯一值字典:
输出:
然后使用这些值替换列中的值:
输出:
相关问题 更多 >
编程相关推荐