Groupby在pandas timeseries数据帧中选择最近的甚至

newData = [] for group in df.groupby(df['CUSIP']): newData.append(group[group.index == max(group.index)]) 'builtin_function_or_method' object is not iterable In [374]: df.head() Out[374]: CUSIP COLA COLB COLC date 1992-05-08 AAA 238 4256 3.523346 1992-07-13 AAA 234 4677 3.485577 1992-12-12 BBB 221 5150 3.24 1995-12-12 BBB 254 5150 3.25 1997-12-12 BBB 245 6150 3.25 1998-12-12 CCC 234 5140 3.24145 1999-12-12 CCC 223 5120 3.65145

2条回答

网友

1楼 · 编辑于 2024-10-01 22:26:10

In [17]: df
Out[17]: 
           cusip    a     b         c
date                                 
1992-05-08   AAA  238  4256  3.523346
1992-07-13   AAA  234  4677  3.485577
1992-12-12   BBB  221  5150  3.240000
1995-12-12   BBB  254  5150  3.250000
1997-12-12   BBB  245  6150  3.250000
1998-12-12   CCC  234  5140  3.241450
1999-12-12   CCC  223  5120  3.651450

[7 rows x 4 columns]

整理一下

^{pr2}$

从每组中取出最后一个元素

In [20]: df.groupby('cusip').last()
Out[20]: 
         a     b         c
cusip                     
AAA    234  4677  3.485577
BBB    245  6150  3.250000
CCC    223  5120  3.651450

[3 rows x 3 columns]

如果要保留日期索引，请先重置、分组，然后再将索引设置回原处

In [9]: df.reset_index().groupby('cusip').last().reset_index().set_index('date')
Out[9]: 
           cusip    a     b         c
date                                 
1992-07-13   AAA  234  4677  3.485577
1997-12-12   BBB  245  6150  3.250000
1999-12-12   CCC  223  5120  3.651450

[3 rows x 4 columns]

网友

2楼 · 编辑于 2024-10-01 22:26:10

我是这样做的

df = pd.read_csv('/home/desktop/test.csv' )

将日期转换为日期时间

^{pr2}$

按您需要的方式对数据帧进行排序

df = df.sort(['CUSIP','date'], ascending=[True,False]).groupby('CUSIP')

定义聚合时发生的情况（根据排序方式）

def return_first(pd_series):
    return pd_series.values[0]

使dict对所有列应用相同的函数

agg_dict = {c: return_first for c in df.columns}

最终聚集

df = df.agg(agg_dict)

编辑：将日期转换为日期时间可避免此类错误：

In [12]: df.sort(['CUSIP','date'],ascending=[True,False])
Out[12]: 
         date CUSIP  COLA  COLB      COLC           date_time

6  1999-12-12   CCC   223  5120  3.651450 1999-12-12 00:00:00
5  1998-12-12   CCC   234  5140  3.241450 1998-12-12 00:00:00
8   1997-12-4   DDD   999  9999  9.999999 1997-12-04 00:00:00
9  1997-12-05   DDD   245  6150  3.250000 1997-12-05 00:00:00
7   1992-07-6   DDD   234  4677  3.485577 1992-07-06 00:00:00

相关问题更多 >

编程相关推荐

热门问题

热门文章