拆分列>>获取唯一值>>将唯一值添加回列

type(fl1.cuisines) pandas.core.series.Series cuisines_type = fl1['cuisines'].tolist() type(cuisines_type) list cuisines_type #this returns list of cuisines cuisines_set = set([ a for b in cuisines_type for a in b]) TypeError: 'float' object is not iterable

3条回答

网友

1楼 · 编辑于 2024-09-27 04:26:12

I want to split this column on comma and fetch unique values from this column. Those unique values I want to add back to the original data frame as new columns

a = list(set([i.strip() for i in ','.join(df['cuisine']).split(',')]))

输出

['Thai',
 'Mughlai',
 'Mexican',
 'Rajasthani',
 'Andhra',
 'Chinese',
 'North Indian',
 'Cafe',
 'Italian',
 'South Indian']

使用pd.assign将这些列添加回原始df

df.assign(**{i:0 for i in a})

网友

2楼 · 编辑于 2024-09-27 04:26:12

我相信您需要^{}，如果可能的话，每个列的重复项将删除它们max-对于计数值，输出总是0或1的sum：

df = fl1.cuisines.str.get_dummies(', ').max(level=0, axis=1)
#if need count values
#df = fl1.cuisines.str.get_dummies(', ').sum(level=0, axis=1)
print (df)
   Andhra  Cafe  Chinese  Italian  Mexican  Mughlai  North Indian  Rajasthani  \
0       0     0        1        0        0        1             1           0   
1       0     0        1        0        0        0             1           0   
2       0     1        0        1        1        0             0           0   
3       0     0        0        0        0        0             1           0   
4       0     0        0        0        0        0             1           1   
5       0     0        0        0        0        0             1           0   
6       1     0        1        0        0        0             1           0   

   South Indian  Thai  
0             0     0  
1             0     1  
2             0     0  
3             1     0  
4             0     0  
5             0     0  
6             1     0

使用^{}解决方案也有类似的可能性：

df = pd.get_dummies(fl1['cuisines'].str.split(', ',expand=True).stack()).max(level=0)

网友

3楼 · 编辑于 2024-09-27 04:26:12

将您的fie保存为csv，然后使用pandas.read_csv()方法加载它。然后对每列进行解析，将每列放入各自的列表中，然后获取每个列表的唯一值。你知道吗

使用这些新列表中的值初始化一个新的数据帧，这些值现在具有唯一的条目。你知道吗

df = pd.read_csv('cuisine.csv')
column_1_lst = list(set(df.iloc[:,0].values.tolist()))
.                                                        # period here means up to, like (1, 2,....,n) notation
.
column_n_lst = list(set(df.iloc[:,n].values.tolist()))

new_dataframe = pd.DataFrame()
new_dataframe['Column_1_unique'] = column_1_lst
.
.
new_dataframe['Column_n_unique'] = column_n_lst

注意：只要确保你所有的列表都是相同的长度，这个工作。你知道吗

希望这有帮助：））

相关问题更多 >

编程相关推荐

热门问题

热门文章