Pandas基于dt yearweeknumber的普通线性回归（截至2018年）？

+---------------------+-------------+--------------------+--------------------+ | Date | YearWeekNum | Dependent_Variable | Bonus_Grouping_Int | +---------------------+-------------+--------------------+--------------------+ | 2017-07-01 00:12:07 | 2017-Wk26 | 35.4 | 1 | | 2017-07-01 00:12:07 | 2017-Wk26 | 33.3 | 2 | | 2018-01-05 25:12:07 | 2018-Wk0 | 28.2 | 1 | | 2018-01-05 25:12:07 | 2018-Wk0 | 24.2 | 2 | +---------------------+-------------+--------------------+--------------------+

1条回答

网友

1楼 · 发布于 2024-09-26 18:20:02

这是我能够解决的“解决方案”：

首先，我只想要第1-52周，不包括0或53周。在

df['YearWeekNum'] = df['Date'].dt.strftime('%Y-Wk%U')
df.loc[df['YearWeekNum'].str.contains('Wk53') == True, 'YearWeekNum'] = '2017-Wk52'
df.loc[df['YearWeekNum'].str.contains('Wk00') == True, 'YearWeekNum'] = '2018-Wk01'

然后，我创建了一个列，使用dt.to_period功能将所有日期按一年中的星期按顺序分组：

^{pr2}$

这是一个有点迂回的地方。首先，按周创建一组有序的时间段集：

dictionary_of_time_periods = dict()
set_of_periods = set(df['time_period'])
ordered_list_of_set = list(set_of_periods)
ordered_list_of_set.sort()

其次，创建一个字典，其中按时间顺序排列的时间段按顺序编号：

index_key = 0

# The following loop creates a dictionary of each time period (weeks by default)
# which is used to create a consecutive sequence (1,n) for each week.
# This dictionary is passed into the "apply_order" function which adds the column
# to the DataFrame
for t_period_pair in ordered_list_of_set:
    this_per = ordered_list_of_set[index_key]
    dictionary_of_time_periods[this_per] = (index_key + 1)
    index_key += 1

最后，向dataframe添加一个新列，其中每个数据点都从有序字典中给定数字（0，n）：

df['ordered_nums'] = df.apply(lambda to_column: apply_order(to_column['time_period'], dictionary_of_time_periods),
                              axis=1)

其中函数apply_order只是一个字典查找：

def apply_order(df_like, dictionary_of_timeframes):
    return dictionary_of_timeframes[df_like]

然后，对于线性回归：

import statsmodels.formula.api as smf
import matplotlib.pyplot as plt

plt.style.use('ggplot')
regression_result = smf.ols(formula='Dependent_Variable ~ ordered_nums', data=df).fit()
print(regression_result.summary())
print(regression_result.params)

regression_intercept = regression_result.params[0]
regression_slope = regression_result.params[1]

n_points = len(set(df['YearWeekNum']))
plot_x_array = []
for inty in range(0, (n_points + 2)):
    plot_x_array += [inty]

ols_regression_y_hat = [regression_slope * i + regression_intercept for i in plot_x_array]
ax.plot(plot_x_array, ols_regression_y_hat, c='xkcd:violet', label='Linear Regression')
fig.legend()

我希望这对某些人有帮助！在

相关问题更多 >

编程相关推荐

热门问题

热门文章