如何做一个forloop，它迭代15分钟的记录，节省空间？

import pandas as pd import numpy as np # sample data speed = np.random.uniform(0,25,15000000) data_dict = {'speed': speed} df = pd.DataFrame(data_dict) # create a list of 'windows', i.e. subseries of the list def GetShiftingWindows(thelist, size): return [ thelist[x:x+size] for x in range( len(thelist) - size + 1 ) ] window_size = 10 list_of_win_speeds = GetShiftingWindows(df.speed, window_size) list_of_max_speeds = [] for x in list_of_win_speeds: max_value = max(x) list_of_max_speeds.append(max_value)

2条回答

网友

1楼 · 编辑于 2024-09-27 04:24:43

首先，您应该使用pandas聚合函数，而不是尝试遍历列表并自己执行。现在还不清楚这个函数到底应该做什么：

def GetShiftingWindows(thelist, size):
    return [ thelist[x:x+size] for x in range( len(thelist) - size + 1 ) ]

但它所做的是创建一个非常大的字典。考虑投资yield。当你使用yield时，你没有把这本大字典存储在内存中。你知道吗

def GetShiftingWindows(thelist, size):
    for x in range( len(thelist) - size + 1 ):
        yield thelist[x:x+size]

您可以使用xrange() instead of range()挤出另外几个字节。你知道吗

yield和xrange的优点是它不在内存中存储列表。相反，生成一个延迟计算的iterable，它的内存需求更小。你知道吗

网友

2楼 · 编辑于 2024-09-27 04:24:43

作为第一步，我会改变

return [ thelist[x:x+size] for x in range( len(thelist) - size + 1 ) ]

进入

return ( thelist[x:x+size] for x in range( len(thelist) - size + 1 ) )

然后您将得到一个生成器，您的代码将在内存中创建整个子列表，生成器方法将在每次for迭代中只生成一个子列表

如果使用python2，还可以将range（一次生成整个列表）更改为xrange（同样，生成器每次调用只生成一个值）

最后，您可以使用islice返回迭代器生成器：

from itertools import islice

以及

return ( islice(thelist, x, x + size) for x in range( len(thelist) - size + 1 ) )

相关问题更多 >

编程相关推荐

热门问题

热门文章