将多个dataframe列的groupby加权平均值作为datafram返回

#Sample data import pandas as pd df = pd.DataFrame({ 'building': ['A1', 'A1', 'A1', 'A1'], 'day': ['2019-07-02', '2019-07-02', '2019-07-03', '2019-07-03'], 'id': ['alak', 'ldau', 'laud', 'lkdu'], 'counts': [1, 2, 3, 7], 'elevation': [5.7, 7.8, 8.7, 6.9], 'width':[1.2, 2.4, 3.4, 2.7] }) df building day id counts elevation width 0 A1 2019-07-02 alak 1 5.7 1.2 1 A1 2019-07-02 ldau 2 7.8 2.4 2 A1 2019-07-03 laud 3 8.7 3.4 3 A1 2019-07-03 lkdu 7 6.9 2.7 # What I want to get: building day elevation width 0 A1 2019-07-02 7.1 2.0 1 A1 2019-07-03 7.4 2.9

3条回答

网友

1楼 · 编辑于 2024-06-25 23:22:28

您可以按以下步骤进行：

df_sum= df.copy()
df_sum['elevation']*= df_sum['counts']
df_sum['width']*= df_sum['counts']

df_sum= df_sum.groupby(['building', 'day']).agg(dict(elevation=sum, width=sum, counts=sum))
df_sum['elevation']/= df_sum['counts']
df_sum['width']/= df_sum['counts']
df_sum.reset_index(inplace=True)
df_sum.drop('counts', axis='columns', inplace=True)

结果是：

  building         day  elevation  width
0       A1  2019-07-02       7.10   2.00
1       A1  2019-07-03       7.44   2.91

网友

2楼 · 编辑于 2024-06-25 23:22:28

我想有更好的方法，但这确实做到了：

df = pd.DataFrame({
  'building': ['A1', 'A1', 'A1', 'A1'],
  'day': ['2019-07-02', '2019-07-02', '2019-07-03', '2019-07-03'],
  'id': ['alak', 'ldau', 'lauid', 'lkdu'],
  'counts': [1, 2, 3, 7],
  'elevation': [5.7, 7.8, 8.7, 6.9],
  'width':[1.2, 2.4, 3.4, 2.7]
})

df = df.set_index(['building','day'])
sum_count = df.groupby(['building','day']).counts.sum()
df['w_elevation'] = df.elevation*df.counts /sum_count
df['w_width'] = df.width*df.counts / sum_count
df.groupby(['building','day']).sum()

输出：

                     counts  elevation  width  w_elevation  w_width
building day                                                       
A1       2019-07-02       3       13.5    3.6         7.10     2.00
         2019-07-03      10       15.6    6.1         7.44     2.91

网友

3楼 · 编辑于 2024-06-25 23:22:28

您可以使用reindex和repeat的技巧

df.reindex(df.index.repeat(df.counts)).drop('counts',1).\
     groupby(['building','day'],as_index=False).mean()
Out[110]: 
  building         day  elevation  width
0       A1  2019-07-02       7.10   2.00
1       A1  2019-07-03       7.44   2.91

相关问题更多 >

编程相关推荐

热门问题

热门文章