<p>我对pandas很陌生,我有一个大约500000行的pandas数据框,里面充满了数字。我使用的是python2.x,目前正在定义和调用下面所示的方法。如果序列“a”中的两个相邻值相同,则它将预测值设置为等于序列“B”中的相应值。但是,它运行得非常慢,大约每秒输出5行,我想找到一种方法来更快地完成相同的结果。在</p>
<pre><code>def myModel(df):
A_series = df['A']
B_series = df['B']
seriesLength = A_series.size
# Make a new empty column in the dataframe to hold the predicted values
df['predicted_series'] = np.nan
# Make a new empty column to store whether or not
# prediction matches predicted matches B
df['wrong_prediction'] = np.nan
prev_B = B_series[0]
for x in range(1, seriesLength):
prev_A = A_series[x-1]
prev_B = B_series[x-1]
#set the predicted value to equal B if A has two equal values in a row
if A_series[x] == prev_A:
if df['predicted_series'][x] > 0:
df['predicted_series'][x] = df[predicted_series'][x-1]
else:
df['predicted_series'][x] = B_series[x-1]
</code></pre>
<p>有没有办法把它矢量化或者让它运行得更快?在目前的情况下,预计要花很多时间。真的要花这么长时间吗?似乎500000行不应该给我的程序带来那么多问题。在</p>