平均值重采样时检测并排除异常值

2024-06-14 02:56:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我已经处理了我的复杂数据,在按Datetime排序值之后创建了一个dataframe,如下所示

                                   Tme    Vtc    Stc
DateTime                                            
2012-07-01 00:00:00.000000000  0.00000   7.06   4.51
2012-07-01 00:00:00.000000000  0.00000   5.47   1.11
2012-07-01 00:00:00.000000000  0.00000   5.81   2.32
2012-07-01 00:00:00.000000000  0.00000   7.20   7.65
2012-07-01 00:00:00.000000000  0.00000  11.63  18.30
2012-07-01 00:00:00.000000000  0.00000   4.97   0.58
2012-07-01 00:00:00.000000000  0.00000   4.93   0.37
2012-07-01 00:00:00.000000000  0.00000   4.78   0.19
2012-07-01 00:00:00.000000000  0.00000   5.62   2.13
2012-07-01 00:00:00.000000000  0.00000   6.07   1.67
2012-07-01 00:00:00.000000000  0.00000   7.29   6.21
2012-07-01 00:00:29.980799999  0.00833   4.97   0.58
2012-07-01 00:00:29.980799999  0.00833   5.62   2.13
2012-07-01 00:00:29.980799999  0.00833   7.19   7.63
2012-07-01 00:00:29.980799999  0.00833  11.63  18.33
2012-07-01 00:00:29.980799999  0.00833   6.07   1.67
2012-07-01 00:00:29.980799999  0.00833   4.77   0.19
2012-07-01 00:00:29.980799999  0.00833   4.94   0.38
2012-07-01 00:00:29.980799999  0.00833   7.07   4.54
2012-07-01 00:00:29.980799999  0.00833   5.82   2.34
2012-07-01 00:00:29.980799999  0.00833   5.47   1.11
2012-07-01 00:00:29.980799999  0.00833   7.28   6.15

我在这方面的目标是平均我的数据在每次和重采样它每3分钟。我可以使用以下代码实现这一点

m3Reample2 = plotVTEC.resample('3T').agg(dict(Tme='first', Vtc='mean', Stc='mean'))

生成的数据帧如下所示

                          Vtc      Tme       Stc
DateTime                                        
2012-07-01 00:00:00  6.433377  0.00000  4.083636
2012-07-01 00:03:00  6.427455  0.05833  4.085455
2012-07-01 00:06:00  6.428182  0.10000  4.104091
2012-07-01 00:09:00  6.431169  0.15000  4.141039
2012-07-01 00:12:00  6.453818  0.20833  4.233636
2012-07-01 00:15:00  6.484697  0.25000  4.350758
2012-07-01 00:18:00  6.544416  0.30000  4.566623
2012-07-01 00:21:00  6.564231  0.35833  4.580385
2012-07-01 00:24:00  6.459677  0.40000  4.289355
2012-07-01 00:27:00  6.450649  0.45000  4.379091
2012-07-01 00:30:00  6.482727  0.50833  4.515455
2012-07-01 00:33:00  6.501061  0.55000  4.620758
2012-07-01 00:36:00  6.677857  0.60000  5.182738
2012-07-01 00:39:00  6.632500  0.65833  5.084167
2012-07-01 00:42:00  6.598194  0.70000  5.015417
2012-07-01 00:45:00  6.537738  0.75000  4.885595
2012-07-01 00:48:00  6.482000  0.80833  4.772333
2012-07-01 00:51:00  6.424861  0.85000  4.641111
2012-07-01 00:54:00  6.343333  0.90000  4.459286
2012-07-01 00:57:00  6.286167  0.95833  4.334167
2012-07-01 01:00:00  6.230139  1.00000  4.213472

这里的问题是,对于datetime,例如2012-07-01 00:00:00,11.63是一个异常值,它影响了这个datetime的平均值

我如何从平均数的计算中排除这一点?你知道吗


Tags: 数据代码目标dataframedatetime排序meanstc