为回归创建组标识符

# get a DataFrame with just the unique "keys" df2 = df.replace(np.NaN, -1) g = df2.groupby([u'id1',u'id2',u'id3']) gdf = pd.DataFrame(g.groups.keys(),columns=df.columns) gdf = gdf.replace(-1, np.NaN) # an idea is to re-use the index as the 'group_id' # the next three commands support that gdf.sort([u'id1',u'id2',u'id3'],inplace=True) gdf.reset_index(drop=True,inplace=True) gdf['group_id'] = gdf.index # merge on the three id columns mdf = df.merge(gdf,how='inner',on=df.columns.tolist())

2条回答

网友

1楼 · 编辑于 2024-06-14 21:19:29

当然有无数的解决办法。这就是我的目的。。。你知道吗

>>> df
  id1  id2  id3
0   A    1  100
1   A    1  101
2   B    1  100
3   B    1  100

# get a DataFrame with just the unique "keys"
g = df.groupby([u'id1',u'id2',u'id3'])
gdf = pd.DataFrame(g.groups.keys(),columns=df.columns)

# an idea is to re-use the index as the 'group_id'
# the next three commands support that 
gdf.sort([u'id1',u'id2',u'id3'],inplace=True)
gdf.reset_index(drop=True,inplace=True)
gdf['group_id'] = gdf.index

# merge on the three id columns
mdf = df.merge(gdf,how='inner',on=df.columns.tolist())

产生：

  id1  id2  id3  group_id
0   A    1  100         0
1   A    1  101         1
2   B    1  100         2
3   B    1  100         2

网友

2楼 · 编辑于 2024-06-14 21:19:29

这就是你要找的吗？你知道吗

df = pd.DataFrame({'id1': ['A','A','B','B'],'id2':[1,1,1,1],'id3':[100,101,100,100]})

def makegroup(x,y,z):
    return str(x) + str(y) + str(z)

df['groupid'] = df.apply(lambda row: makegroup(row['id1'], row['id2'], row['id3']), axis=1)

groupiddict = {}
groupincrimenter = 1

for x in df['groupid'].unique():
    groupiddict[x] = groupincrimenter
    groupincrimenter += 1

df['groupidINT'] = df.apply(lambda row: int(groupiddict[row['groupid']]), axis=1)

以下是输出：

  id1  id2  id3 groupid  groupidINT
0   A    1  100   A1100           1
1   A    1  101   A1101           2
2   B    1  100   B1100           3
3   B    1  100   B1100           3

相关问题更多 >

编程相关推荐

热门问题

热门文章