为什么我的Pandas要加入移动连接的数据行?

2024-10-04 11:22:37 发布

您现在位置:Python中文网/ 问答频道 /正文

在Pandas中,当Ijoin时,连接的数据相对于原始数据帧是不对齐的:

import os
import pandas as pd
import statsmodels.formula.api as sm
import numpy as np
import matplotlib.pyplot as plt

flu_train = pd.read_csv('FluTrain.csv')
# From: https://courses.edx.org/c4x/MITx/15.071x/asset/FluTrain.csv
cols = ['Ystart', 'Mstart', 'Dstart', 'Yend', 'Mend', 'Dend']
flu_train = flu_train.join(pd.DataFrame(flu_train.Week.str.findall('\d+').tolist(), dtype=np.int64, columns=cols))

flu_trend_1 = sm.ols('log(ILI) ~ Queries', flu_train).fit()

flu_test = pd.read_csv('FluTest.csv')
# From: https://courses.edx.org/c4x/MITx/15.071x/asset/FluTest.csv
flu_test = flu_test.join(pd.DataFrame(flu_test.Week.str.findall('\d+').tolist(), dtype=np.int64, columns=cols))

flu_test = flu_test.join(pd.DataFrame(exp(flu_trend_1.predict(flu_test)), columns=['ILIPred1'] ))
flu_train['ILIShift2'] = flu_train.ILI.shift(2)

flu_trend_2 = sm.ols('log(ILI) ~ Queries + log(ILIShift2)', flu_train).fit()
flu_test['ILIShift2'] = flu_test.ILI.shift(2)

# Note that this does not work in a simplified example
# See -- http://stackoverflow.com/q/22457880/
flu_test[:2].ILIShift2 = list(flu_train[-2:].ILI)

# This SHIFTS the joined column "up" two rows, loosing the first two values of ILIPred2 and making the last 2 'NaN'
flu_test = flu_test.join(pd.DataFrame(exp(flu_trend_2.predict(flu_test)), columns=['ILIPred2']))

最后一个语句将连接列“上移”两行,丢失ILIPred2的前两个值,并使最后2个值为“NaN”。我希望连接的列与所有其他列对齐。在

为什么会发生这种情况?我如何预防?在


Tags: columnscsvtestimportdataframeasnptrain
1条回答
网友
1楼 · 发布于 2024-10-04 11:22:37

此联接的数据帧(pd.DataFrame(np.exp(flu_trend_2.predict(flu_test)), columns=['ILIPred2']))的索引从0到49。在

您将它加入flu_test,它的索引为0到51。在

所以,如果这些指数不匹配(50和51),你就会得到NaN,我希望如此。在

如果要强制联接列位于主数据帧的底部,可以执行以下操作(注意使用ilocrow_shift变量):

import os
import pandas as pd
import statsmodels.formula.api as sm
import numpy as np
import matplotlib.pyplot as plt

row_shift = 2

flu_train = pd.read_csv('https://courses.edx.org/c4x/MITx/15.071x/asset/FluTrain.csv')
cols = ['Ystart', 'Mstart', 'Dstart', 'Yend', 'Mend', 'Dend']
flu_train = flu_train.join(pd.DataFrame(flu_train.Week.str.findall('\d+').tolist(), dtype=np.int64, columns=cols))

flu_trend_1 = sm.ols('np.log(ILI) ~ Queries', flu_train).fit()

flu_test = pd.read_csv('https://courses.edx.org/c4x/MITx/15.071x/asset/FluTest.csv')
flu_test = flu_test.join(pd.DataFrame(flu_test.Week.str.findall('\d+').tolist(), dtype=np.int64, columns=cols))

flu_test = flu_test.join(pd.DataFrame(np.exp(flu_trend_1.predict(flu_test)), columns=['ILIPred1'] ))
flu_train['ILIShift2'] = flu_train.ILI.shift(row_shift)

flu_trend_2 = sm.ols('np.log(ILI) ~ Queries + np.log(ILIShift2)', flu_train).fit()
flu_test['ILIShift2'] = flu_test.ILI.shift(row_shift)

# Note that this does not work in a simplified example
# See   http://stackoverflow.com/q/22457880/
flu_test.iloc[:2].ILIShift2 = list(flu_train.iloc[-row_shift:].ILI)

joiner = pd.DataFrame(np.exp(flu_trend_2.predict(flu_test)), columns=['ILIPred2'], index=flu_test.index[row_shift:])
flu_test.join(joiner)

这给了我:

^{pr2}$

相关问题 更多 >