Pandas如何按列的值对列进行分组

2024-06-28 11:22:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框:

df = pd.DataFrame([2,2,6,9,7,6,2,9,7,11], columns=['cat1','cat2','cat3','cat4','cat5','cat6','cat7','cat8','cat9','cat10'])

在这个df中,只有一行

如何根据列的值对这些列进行分组,并在绘图中显示列的簇

enter image description here

目前,这是我的代码,但它显示了错误的信息

grouped_cats = df.groupby(by= lambda value: value, axis = 1)
list(grouped_cats)[0]

Tags: columns数据dataframedfvaluepdcatscat1
2条回答

无法理解您的用例,但我认为文件代码应该有所帮助

columns=['cat1','cat2','cat3','cat4','cat5','cat6','cat7','cat8','cat9','cat10']
df = pd.DataFrame([[2,2,6,9,7,6,2,9,7,11]],columns=columns )

grouped_cats = {}
for i,val in enumerate(df.iloc[0]):
    if val in grouped_cats:
        grouped_cats[val].append(columns[i])
    else:
        grouped_cats[val]= [columns[i]]

Output = {2: ['cat1', 'cat2', 'cat7'], 6: ['cat3', 'cat6'], 9: ['cat4', 'cat8'], 7: ['cat5', 'cat9'], 11: ['cat10']}

我能想到的最简单的可视化方法是

import matplotlib.pyplot as plt

colours = ['green', 'orange', 'red','blue','black']
cluster = {2: ['cat1', 'cat2', 'cat7'],
 6: ['cat3', 'cat6'],
 9: ['cat4', 'cat8'],
 7: ['cat5', 'cat9'],
 11: ['cat10']}

fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_xticks([i for i in range(2,12)] )
for colour, (x, ys) in zip(colours, cluster.items()):
    ax.scatter([x] * len(ys), ys, c=colour, linewidth=0, s=50)


plt.show()
        
另一种可视化方法是,对于数据中的每个唯一值,计算相关标签的数量,并绘制散点图,以便使用类名进行注释。
import matplotlib.pyplot as plt

colours = ['green', 'orange', 'red','blue','black']

cluster = {2: ['c1', 'c2', 'c7'],
 6: ['c3', 'c6'],
 9: ['c4', 'c8'],
 7: ['c5', 'c9'],
 11: ['c10']}

z = [len(cluster[ke]) for ke in cluster ]
y = [ke for ke in cluster ]
fig, ax = plt.subplots()
ax.set_xticks([i for i in range(2,12)] )
ax.scatter(y, z, c=colours)
for i,val in enumerate(cluster):
    ax.annotate(','.join(cluster[val]), (y[i], z[i]))

enter image description here

你说的聚类图是什么意思。我认为,最好的方法是通过散点图来可视化这种传播。如果需要,您可以转置和重命名

df.T.reset_index().plot(kind='scatter', x='index', y=0)

enter image description here

甚至阴谋

df.T.reset_index().plot(kind='bar', x='index', y=0)

enter image description here

根据您的评论和澄清,groupby和dict

df.T.reset_index().groupby(0).agg(list).to_dict()

{'index': {2: ['cat1', 'cat2', 'cat7'],
  6: ['cat3', 'cat6'],
  7: ['cat5', 'cat9'],
  9: ['cat4', 'cat8'],
  11: ['cat10']}}

相关问题 更多 >