在Pandas中,当Ijoin
时,连接的数据相对于原始数据帧是不对齐的:
import os
import pandas as pd
import statsmodels.formula.api as sm
import numpy as np
import matplotlib.pyplot as plt
flu_train = pd.read_csv('FluTrain.csv')
# From: https://courses.edx.org/c4x/MITx/15.071x/asset/FluTrain.csv
cols = ['Ystart', 'Mstart', 'Dstart', 'Yend', 'Mend', 'Dend']
flu_train = flu_train.join(pd.DataFrame(flu_train.Week.str.findall('\d+').tolist(), dtype=np.int64, columns=cols))
flu_trend_1 = sm.ols('log(ILI) ~ Queries', flu_train).fit()
flu_test = pd.read_csv('FluTest.csv')
# From: https://courses.edx.org/c4x/MITx/15.071x/asset/FluTest.csv
flu_test = flu_test.join(pd.DataFrame(flu_test.Week.str.findall('\d+').tolist(), dtype=np.int64, columns=cols))
flu_test = flu_test.join(pd.DataFrame(exp(flu_trend_1.predict(flu_test)), columns=['ILIPred1'] ))
flu_train['ILIShift2'] = flu_train.ILI.shift(2)
flu_trend_2 = sm.ols('log(ILI) ~ Queries + log(ILIShift2)', flu_train).fit()
flu_test['ILIShift2'] = flu_test.ILI.shift(2)
# Note that this does not work in a simplified example
# See -- http://stackoverflow.com/q/22457880/
flu_test[:2].ILIShift2 = list(flu_train[-2:].ILI)
# This SHIFTS the joined column "up" two rows, loosing the first two values of ILIPred2 and making the last 2 'NaN'
flu_test = flu_test.join(pd.DataFrame(exp(flu_trend_2.predict(flu_test)), columns=['ILIPred2']))
最后一个语句将连接列“上移”两行,丢失ILIPred2的前两个值,并使最后2个值为“NaN”。我希望连接的列与所有其他列对齐。在
为什么会发生这种情况?我如何预防?在
此联接的数据帧(
pd.DataFrame(np.exp(flu_trend_2.predict(flu_test)), columns=['ILIPred2'])
)的索引从0到49。在您将它加入
flu_test
,它的索引为0到51。在所以,如果这些指数不匹配(50和51),你就会得到
NaN
,我希望如此。在如果要强制联接列位于主数据帧的底部,可以执行以下操作(注意使用
iloc
和row_shift
变量):这给了我:
^{pr2}$相关问题 更多 >
编程相关推荐