使用groupbyapply对数据帧索引（DatetimeIndex）进行聚合

>>> winddata sonic_Ux sonic_Uy sonic_Uz TIMESTAMP 2014-04-30 14:13:12.300000 0.322444 2.530129 0.347921 2014-04-30 14:13:12.400000 0.357793 2.571811 0.360840 2014-04-30 14:13:12.500000 0.469529 2.400510 0.193011 2014-04-30 14:13:12.600000 0.298787 2.212599 0.404752 2014-04-30 14:13:12.700000 0.259310 2.054919 0.066324 2014-04-30 14:13:12.800000 0.342952 1.962965 0.070500 2014-04-30 14:13:12.900000 0.434589 2.210533 -0.010147 ... ... ... [4361447 rows x 3 columns] >>> winddata.dtypes sonic_Ux float64 sonic_Uy float64 sonic_Uz float64 dtype: object >>> hhdata = winddata.groupby(TimeGrouper('30T')); hhdata <pandas.core.groupby.DataFrameGroupBy object at 0xb440790c>

>>> for name, g in hhdata: ... print name, atan2(g['sonic_Ux'].mean(), g['sonic_Uy'].mean()), ' wd' ... 2014-04-30 14:00:00 0.13861912975 wd 2014-04-30 14:30:00 0.511709085506 wd 2014-04-30 15:00:00 -1.5088990774 wd 2014-04-30 15:30:00 0.13200013186 wd <<snip>> >>> def winddir(g): ... return pd.Series(atan2( np.mean(g['sonic_Ux']), np.mean(g['sonic_Uy']) ), name='wd') ... >>> hhdata.apply(winddir) 2014-04-30 14:00:00 0 0.138619 2014-04-30 14:30:00 0 0.511709 2014-04-30 15:00:00 0 -1.508899 2014-04-30 15:30:00 0 0.132000 ... 2014-05-05 14:00:00 0 -2.551593 2014-05-05 14:30:00 0 -2.523250 2014-05-05 15:00:00 0 -2.698828 Name: wd, Length: 243, dtype: float64 >>> hhdata.apply(winddir).index[0] (Timestamp('2014-04-30 14:00:00', tz=None), 0) >>> def winddir(g): ... return pd.DataFrame({'wd':atan2(g['sonic_Ux'].mean(), g['sonic_Uy'].mean())}, index=[g.name]) ... >>> hhdata.apply(winddir) wd 2014-04-30 14:00:00 2014-04-30 14:00:00 0.138619 2014-04-30 14:30:00 2014-04-30 14:30:00 0.511709 2014-04-30 15:00:00 2014-04-30 15:00:00 -1.508899 2014-04-30 15:30:00 2014-04-30 15:30:00 0.132000 ... [243 rows x 1 columns] >>> hhdata.apply(winddir).index[0] (Timestamp('2014-04-30 14:00:00', tz=None), Timestamp('2014-04-30 14:00:00', tz=None)) >>> >>> tsfast.groupby(TimeGrouper('30T')).apply(lambda g: ... Series({'wd': atan2(g.sonic_Ux.mean(), g.sonic_Uy.mean()), ... 'ws': np.sqrt(g.sonic_Ux.mean()**2 + g.sonic_Uy.mean()**2)})) 2014-04-30 14:00:00 wd 0.138619 ws 1.304311 2014-04-30 14:30:00 wd 0.511709 ws 0.143762 2014-04-30 15:00:00 wd -1.508899 ws 0.856643 ... 2014-05-05 14:30:00 wd -2.523250 ws 3.317810 2014-05-05 15:00:00 wd -2.698828 ws 3.279520 Length: 486, dtype: float64

>>> winddata.index.name = 'WASINDEX' >>> data2 = winddata.reset_index() >>> def to_hh(x): # <-- big hammer ... ts = x.isoformat() ... return ts[:14] + ('30:00' if int(ts[14:16]) >= 30 else '00:00') ... >>> data2['TS'] = data2['WASINDEX'].apply(lambda x: to_hh(x)) >>> wd = data2.groupby('TS').apply(lambda df: Series({'wd': np.arctan2(df.x.mean(), df.y.mean())})) >>> type(wd) pandas.core.frame.DataFrame >>> wd.columns Index([u'wd'], dtype=object) >>> wd.index Index([u'2014-04-30T14:00:00', u'2014-04-30T14:30:00', <<snip>> dtype=object)

1条回答

网友

1楼 · 发布于 2024-09-29 20:17:08

In [31]: pd.set_option('max_rows',10)

In [32]: winddata = DataFrame({ 'x' : np.random.randn(N), 'y' : np.random.randn(N)+2, 'z' : np.random.randn(N) },pd.date_range('20140430 14:13:12',periods=N,freq='100ms'))

In [33]: winddata
Out[33]: 
                                   x         y         z
2014-04-30 14:13:12        -0.065350  0.567525  2.212534
2014-04-30 14:13:12.100000 -0.436498  2.591799  2.424359
2014-04-30 14:13:12.200000 -1.059038  3.120631 -0.645579
2014-04-30 14:13:12.300000  1.973474  0.630424  0.966405
2014-04-30 14:13:12.400000  0.575082  1.941845 -0.674695
...                              ...       ...       ...
2014-05-05 15:22:16.200000  0.601962  0.027834 -0.101967
2014-05-05 15:22:16.300000  0.741777  1.764745  0.991516
2014-05-05 15:22:16.400000 -0.494253  1.765930  2.493000
2014-05-05 15:22:16.500000 -2.643749  0.671604  0.275096
2014-05-05 15:22:16.600000  0.676698  0.958903  0.946942

[4361447 rows x 3 columns]

In [34]: winddata.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 4361447 entries, 2014-04-30 14:13:12 to 2014-05-05 15:22:16.600000
Freq: 100L
Data columns (total 3 columns):
x    float64
y    float64
z    float64
dtypes: float64(3)

在<；0.14.0中，使用pd.时间分配器在

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章