Python-Groupby:如何在一个Lin中压缩For循环

2024-09-29 01:19:15 发布

您现在位置:Python中文网/ 问答频道 /正文

对于Python熊猫:

我想简化我的代码,使它最终成为一个单行程序(原因:性能优化)。你知道吗

如何编写它,使我只有一行包含groupby语句?你知道吗

比如:

dfResult = df2.groupby("a").something().I()Do()Not()Understand()Yet()

这是我的代码(我想过滤掉这些列a,其中b行之间的标准差太大):

import pandas as pd

dfResult = pd.DataFrame()

df2 = pd.DataFrame({'a': ("w", "w", "w", "w", "x", "x", "x"), 'b': (30, 42, 54, 68, 7, 8, 65)})
print('input data:')
print(df2)
dfGroupBy = df2.groupby("a")
for key, item in dfGroupBy:
    innerDf = dfGroupBy.get_group(key)
    # calculate delta between two rows for column 'b'
    innerDf['delta'] = innerDf['b'] - innerDf['b'].shift(1)
    # calculate standard deviation (without the first row)
    standardDeviation = pd.np.std(innerDf['delta'][1:])
    if standardDeviation < 15:
        print ("so my standard deviation is small enough!")
        print(innerDf['delta'][1:])
        print("standard deviation:", standardDeviation)
        # remove column 'delta', as I needed it only in between
        innerDf = innerDf.drop('delta', axis=1)
        dfResult = dfResult.append(innerDf)

print("result:")
print(dfResult)

这是控制台输出:

input data:
   a   b
0  w  30
1  w  42
2  w  54
3  w  68
4  x   7
5  x   8
6  x  65
so my standard deviation is small enough!
1    12.0
2    12.0
3    14.0
Name: delta, dtype: float64
standard deviation: 0.942809041582
result:
   a   b
0  w  30
1  w  42
2  w  54
3  w  68

Tags: 代码dataframeinputasstandardpddeltadf2