带两个或多个索引的数据帧循环

Location Test# Type Parm1 Weight M36 Test1 A 1.39 233 Test2 B 1.44 281 Test3 B 1.40 239 Test4 A 1.49 438 Test5 C 0.99 112 Test6 C 1.74 200 Test7 A 1.17 100 Test8 A 2.40 7.8 M37 Test1 B 2.91 232 Test2 A 20.2 0 Test3 C 4.88 958 Test4 A 9.46 0

Location Test# Type Parm1 Weight Weighted Ave. M36 Test1 A 1.39 233 1.434 Test2 B 1.44 281 Test3 B 1.40 239 Test4 A 1.49 438 Test5 C 0.99 112 Test6 C 1.74 200 Test7 A 1.17 100 Test8 A 2.40 7.8 M37 Test1 B 2.91 232 4.495 Test2 A 20.2 0 Test3 C 4.88 958 Test4 A 9.46 0

2条回答

网友

1楼 · 编辑于 2024-09-29 01:27:38

有很多方法可以做到这一点，使用groupby。这应该是做这件事最有效的方法之一。你知道吗

df.set_index('Location', inplace=True)                # set the index

df['Weighted_Sum'] = (df.Parm1 * df.Weight)           # calculated weighted sum
v = df[['Weighted_Sum', 'Weight']].sum(level=0)       # groupby + sum

df['Weighted Ave'] = v['Weighted_Sum'] / v['Weight']  # calculate the mean
del df['Weighted_Sum']                                # drop the surrogate column

df

          Test# Type  Parm1  Weight  Weighted Ave
Location                                         
M36       Test1    A   1.39   233.0      1.434275
M36       Test2    B   1.44   281.0      1.434275
M36       Test3    B   1.40   239.0      1.434275
M36       Test4    A   1.49   438.0      1.434275
M36       Test5    C   0.99   112.0      1.434275
M36       Test6    C   1.74   200.0      1.434275
M36       Test7    A   1.17   100.0      1.434275
M36       Test8    A   2.40     7.8      1.434275
M37       Test1    B   2.91   232.0      4.495933
M37       Test2    A  20.20     0.0      4.495933
M37       Test3    C   4.88   958.0      4.495933
M37       Test4    A   9.46     0.0      4.495933

要以您的格式获取Weighted Ave列，请使用mask-

df['Weighted Ave'] = df['Weighted Ave'].mask(df['Weighted Ave'].duplicated(), '')

网友

2楼 · 编辑于 2024-09-29 01:27:38

我现在提供另一种使用agg函数的方法。你知道吗

基本上，可以通过使用df中的numpy.average列和Weight列作为参数来计算加权平均值。你知道吗

之后，只需使用agg来聚合这个lambda函数。你也可以用apply。你知道吗

最后，只需使用join将加权平均值连接回原始数据帧。你知道吗

df["W_Ave"] = np.NaN
wave = lambda x: np.average(df.loc[x.index, "Parm1"], weights = df.loc[x.index, "Weight"])
f = {"Weighted Average": wave}
df_wave = df.groupby(['Location'])["W_Ave"].agg(wave)
del df["W_Ave"]
dffinal = df.join(df_wave, on="Location")

最终输出：

dffinal
Out[38]: 
   Location  Parm1  Test# Type  Weight     W_Ave
0       M36   1.39  Test1    A   233.0  1.434275
1       M36   1.44  Test2    B   281.0  1.434275
2       M36   1.40  Test3    B   239.0  1.434275
3       M36   1.49  Test4    A   438.0  1.434275
4       M36   0.99  Test5    C   112.0  1.434275
5       M36   1.74  Test6    C   200.0  1.434275
6       M36   1.17  Test7    A   100.0  1.434275
7       M36   2.40  Test8    A     7.8  1.434275
8       M37   2.91  Test1    B   232.0  4.495933
9       M37  20.20  Test2    A     0.0  4.495933
10      M37   4.88  Test3    C   958.0  4.495933
11      M37   9.46  Test4    A     0.0  4.495933

如果你只是对加权平均值感兴趣：

df_wave
Out[39]: 
Location
M36    1.434275
M37    4.495933
Name: W_Ave, dtype: float64

相关问题更多 >

编程相关推荐

热门问题

热门文章