如何在python中使用loop高效地进行特征工程?

2024-09-28 17:30:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在努力做到以下几点:

df['SR1'] = df['Open'].pct_change(1)
df['SR2'] = df['Open'].pct_change(2)
df['SR3'] = df['Open'].pct_change(3)
df['SR4'] = df['Open'].pct_change(4)
df['SR5'] = df['Open'].pct_change(5)

df['SR6'] = df['Open'].pct_change(6)
df['SR7'] = df['Open'].pct_change(7)
df['SR8'] = df['Open'].pct_change(8)
df['SR9'] = df['Open'].pct_change(9)
df['SR10'] = df['Open'].pct_change(10)

df['SR11'] = df['Open'].pct_change(11)
df['SR12'] = df['Open'].pct_change(12)
df['SR13'] = df['Open'].pct_change(13)
df['SR14'] = df['Open'].pct_change(14)
df['SR15'] = df['Open'].pct_change(15)

df['SR16'] = df['Open'].pct_change(16)
df['SR17'] = df['Open'].pct_change(17)
df['SR18'] = df['Open'].pct_change(18)
df['SR19'] = df['Open'].pct_change(19)
df['SR20'] = df['Open'].pct_change(20)

df['SR30'] = df['Open'].pct_change(30)
df['SR50'] = df['Open'].pct_change(50)
df['SR70'] = df['Open'].pct_change(70)
df['SR90'] = df['Open'].pct_change(90)

df['SR110'] = df['Open'].pct_change(110)
df['SR130'] = df['Open'].pct_change(130)
df['SR150'] = df['Open'].pct_change(150)
df['SR170'] = df['Open'].pct_change(170)
df['SR190'] = df['Open'].pct_change(190)

df['SR210'] = df['Open'].pct_change(210)
df['SR230'] = df['Open'].pct_change(230)
df['SR250'] = df['Open'].pct_change(250)

它看起来既愚蠢又低效。有没有什么很酷的方法来创建一个函数来循环这个过程?我就是没办法把数字放在pct\u change()的括号里。你知道吗


Tags: dfopenchangepctsr1sr7sr10sr4
3条回答

也许吧

for n in numbers:
    df['SR'+str(n)] = df['Open'].pct_change(n)

包含所有要处理的索引的numbers。你知道吗

如果你想提高效率,不要使用循环。你可以将assign与词典理解一起使用。你知道吗

df = df.assign(**{f'SR{n}': df['Open'].pct_change(n)
                  for n in list(range(1, 21)) + list(range(30, 270, 20))})

或不使用f字符串:

df = df.assign(**{'SR{n}'.format(n): df['Open'].pct_change(n)
                  for n in list(range(1, 21)) + list(range(30, 270, 20))})

计时

稍微快一点使用字典理解。你知道吗

df = pd.DataFrame({'Open': range(252 * 5)})

%%timeit
df.assign(**{f'SR{n}': df['Open'].pct_change(n)
             for n in list(range(1, 21)) + list(range(30, 270, 20))})
# 25.3 ms ± 2.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for n in list(range(1, 21)) + list(range(30, 270, 20)):
    df[f'SR{n}'] = df['Open'].pct_change(n)
# 28.3 ms ± 3.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

为什么不是一个简单的for循环?你知道吗

for n in list(range(1, 20)) + list(range(30, 270, 20)):
    df[f'SR{n}'] = df['Open'].pct_change(n)

注意:f-string表示法仅适用于Python>;=3.6,相当于'SR{}'.format(n)。你知道吗

相关问题 更多 >