按发生频率筛选datetime的numpy数组

网友

1楼 · 编辑于 2024-06-26 00:05:27

在这些类型的操作中，Numpy比熊猫慢，就像np.unique排序一样，而熊猫的机器并不需要。此外，这是更习惯用法。在

熊猫

In [22]: %%timeit
   ....: i = Index(dates)
   ....: i[i.value_counts()>20]
   ....: 
10 loops, best of 3: 78.2 ms per loop

In [23]: i = Index(dates)

In [24]: i[i.value_counts()>20]
Out[24]: 
DatetimeIndex(['2013-06-16 20:40:00', '2013-05-28 03:00:00', '2013-10-31 19:50:00', '2014-06-20 13:00:00', '2013-07-08 21:40:00', '2012-02-26 17:00:00', '2013-01-02 15:40:00', '2012-08-24 02:00:00',
               '2014-10-17 08:20:00', '2012-07-27 20:10:00',
               ...
               '2014-08-07 05:10:00', '2014-05-21 08:10:00', '2014-03-09 12:50:00', '2013-05-10 02:30:00', '2013-04-15 20:20:00', '2012-06-23 05:20:00', '2012-07-06 16:10:00', '2013-02-14 12:20:00',
               '2014-10-27 03:10:00', '2013-09-04 12:00:00'],
              dtype='datetime64[ns]', length=2978, freq=None)

In [25]: len(i[i.value_counts()>20])
Out[25]: 2978

Numpy（来自其他解决方案）

^{pr2}$

网友

2楼 · 编辑于 2024-06-26 00:05:27

实际上可以试试^{}。在numpyv1.9中，unique可以返回一些额外的值，比如unique_indices，unique_inverse，unique_counts。在

如果你想用熊猫，这将是相当简单，可能相当快。您可以使用groupby filter。比如：

out = df.groupby('timestamp').filter(lambda x: len(x) > 20)

网友

3楼 · 编辑于 2024-06-26 00:05:27

根据下面的建议，我正在使用np.unique来编辑此文件以包括计时。这是目前为止最好的解决办法

In [10]: import pandas as pd
         import numpy as np
         from collections import Counter

         #create a fake data set 
         dates = pd.date_range("2012-01-01", "2015-01-01", freq="10min")
         dates = np.random.choice(dates, 2000000, replace=True)

根据以下建议，以下是目前最快的：

^{pr2}$

使用counter可以创建一个每项计数的字典，然后将其转换为pd.Series，以便进行过滤

In [11]: %%timeit
         foo = pd.Series(Counter(dates))
         filtered_dates = np.array(foo[foo > 20].index)
         1 loop, best of 3: 12.3 s per loop

对于一个包含200万个项目的数组来说，这还不算太糟，与以下相比：

In [12]: dates = list(dates)
         filtered_dates = [e for e in set(dates) if dates.count(e) > 20]

我不会等清单理解版完成的。。。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

按发生频率筛选datetime的numpy数组

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >