python pandas：groupby应用函数查看前行

import pandas as pd df = pd.DataFrame({'type' : ['foo', 'foo', 'foo', 'bar','bar'], 'cost' : [1, 4, 2, 8,9]}) df['class'] = np.nan def customFunction(test_df): print np.shape(test_df) iteration = 1 for currRow in test_df.iterrows(): print 'executed' if iteration == 1: test_df['class'] = 'first' else: if currRow[1]['cost'] > priorCost: test_df['class'] = 'greater' elif currRow[1]['cost'] < priorCost: test_df['class'] = 'less' else: test_df['class'] = 'equal' iteration += 1 priorCost = currRow[1]['cost'] return test_df grouped_df = df.groupby(['type']).apply(customFunction)

1条回答

网友

1楼 · 发布于 2024-10-01 07:49:08

我会尽我所能给你-我现在需要休息一下，但是：

df = pd.DataFrame(pd.read_clipboard())
df.set_index('type', inplace=True)
test = df.groupby(level=0).apply(lambda x: x.cost.diff())

给我（因为diff()计算了w.r.t.列中第一个条目的差异）

^{pr2}$

所以这包含了你需要的所有信息。目前，我正在努力将这些信息合并回原始数据帧。df['differences'] = test造成了巨大的混乱。在

更新

我快到了：

>>> df['differences'] = test[1].append(test[0])
>>> df.loc[df['differences'] > 0, 'inWords'] = 'greater'   
>>> df.loc[df['differences'] < 0, 'inWords'] = 'lesser' 
>>> df.loc[df['differences'].isnull(), 'inWords'] = 'first' 
>>> df
Out[184]: 
      cost  differences  inWords
type                            
foo      1          NaN    first
foo      4            3  greater
foo      2           -2   lesser
bar      8          NaN    first
bar      9            1  greater

因此，唯一需要的是泛型表达式，而不是test[1].append(test[0])。也许有人能插手进来？在

更新2

回复您的评论：每当您为apply()定义函数时，如

def compareSomethingWithinAGroup(group):
    someMagicHappens()
    return someValues

您可以访问所有标准pandas函数和函数内的整个组。因此，您可以创建所有复杂的依赖于行的魔法，不管它是什么。唯一需要注意的是：someValues必须是一个只有一列的Series或{}，它的条目数与{}的行数一样多。只要返回这样的someValues，就可以始终执行df['resultOfSomethingComplicated'] = df.groupby(level=0).apply(compareSomethingWithinAGroup)，并使用响应中的所有行。在

相关问题更多 >

编程相关推荐

热门问题

热门文章