我试图从一个几乎没有特征的数据集预测预订数量。特征既可以是绝对的,也可以是连续的
因变量如下所示:我的数据集大小约为917obs
array([ 1, 7, 17, 2, 2, 13, 8, 11, 9, 4, 4, 3, 5, 2, 5, 7, 3,
12, 9, 13, 5, 2, 11, 13, 14, 19, 9, 11, 3, 6, 7, 10, 1, 6,
5, 10, 8, 5, 4, 3, 2, 10, 10, 10, 8, 13, 16, 6, 4, 6, 3,
11, 10, 1, 18, 7, 2, 12, 17, 4, 2, 19, 3, 4, 17, 13, 10, 2,
10, 1, 3, 4, 20, 3, 2, 1, 3, 5, 8, 8, 4, 3, 13, 3, 3,
5, 4, 17, 7, 6, 10, 5, 3, 9, 9, 8, 1, 5, 17, 5, 10, 9,
2, 7, 13, 2, 9, 1, 15, 13, 10, 4, 2, 4, 5, 4, 3, 3, 10,
4, 7, 5, 13, 12, 7, 5, 6, 9, 5, 11, 7, 1, 4, 12, 4, 3,
11, 1, 4, 4, 3, 7, 4, 11, 4, 1, 9, 2, 10, 10, 3, 4, 4,
3, 2, 7, 10, 7, 6, 1, 3, 19, 9, 3, 8, 20, 1, 12, 9, 13,
13, 2, 9, 4, 9, 2, 5, 6, 18, 3, 6, 8, 6, 4, 5, 13, 4,
8, 9, 5, 4, 8, 5, 2, 1, 6, 8, 3, 6, 4, 2, 6, 11, 5,
1, 5, 1, 5, 11, 11, 9, 3, 12, 2, 2, 9, 19, 7, 13, 13, 9,
2, 1, 1, 4, 3, 4, 9, 1, 25, 12, 8, 5, 18, 3, 1, 6, 17,
7, 4, 6, 9, 8, 10, 3, 8, 12, 5, 4, 4, 1, 9, 21, 4, 3,
3, 7, 13, 5, 12, 8, 8, 6, 3, 6, 7, 5, 3, 7, 3, 14, 3,
5, 2, 14, 16, 3, 8, 6, 13, 9, 3, 5, 4, 9, 4, 12, 12, 4,
9, 8, 11, 5, 13, 3, 2, 5, 4, 2, 1, 8, 8, 18, 11, 2, 5,
13, 4, 1, 2, 4, 1, 2, 2, 12, 2, 6, 19, 7, 20, 2, 10, 2,
9, 12, 9, 8, 1, 4, 8, 8, 12, 4, 8, 1, 3, 6, 9, 4, 3,
8, 2, 7, 15, 6, 5, 10, 6, 4, 3, 12, 5, 4, 13, 7, 2, 8,
5, 2, 4, 3, 14, 12, 3, 4, 3, 2, 15, 6, 14, 12, 11, 9, 5,
5, 7, 11, 10, 7, 9, 9, 7, 11, 5, 11, 3, 2, 5, 17, 5, 2,
6, 1, 10, 3, 13, 19, 5, 1, 3, 5, 3, 5, 6, 3, 9, 8, 2,
3, 2, 3, 7, 4, 9, 5, 1, 6, 14, 4, 8, 17, 13, 7, 1, 4,
5, 10, 5, 6, 2, 12, 5, 9, 3, 9, 9, 1, 5, 1, 2, 2, 5,
1, 4, 4, 13, 4, 25, 9, 10, 4, 3, 9, 13, 13, 2, 9, 2, 12,
4, 1, 20, 9, 10, 2, 5, 4, 10, 2, 6, 1, 7, 7, 7, 4, 8,
4, 3, 4, 13, 8, 3, 13, 12, 19, 9, 3, 2, 6, 7, 13, 8, 16,
7, 3, 11, 4, 10, 9, 12, 2, 8, 5, 2, 3, 4, 2, 1, 11, 5,
4, 2, 8, 12, 7, 5, 7, 7, 4, 6, 18, 2, 1, 6, 15, 11, 2,
5, 8, 3, 5, 9, 11, 5, 8, 6, 20, 1, 10, 3, 7, 1, 3, 5,
4, 4, 10, 11, 6, 1, 5, 4, 1, 2, 10, 4, 4, 11, 20, 5, 3,
2, 7, 8, 2, 10, 5, 1, 18, 5, 10, 5, 3, 8, 15, 2, 1, 14,
10, 7, 3, 5, 9, 3, 4, 21, 14, 1, 2, 1, 2, 4, 11, 9, 7,
6, 9, 18, 4, 6, 18, 12, 12, 4, 6, 3, 3, 9, 5, 12, 15, 3,
7, 3, 7, 4, 2, 15, 14, 7, 10, 5, 5, 5, 9, 3, 6, 3, 1,
11, 1, 5, 25, 8, 2, 24, 1, 12, 1, 6, 8, 5, 13, 4, 3, 3,
13, 4, 4, 18, 7, 13, 2, 8, 3, 4, 9, 2, 13, 12, 4, 5, 10,
9, 15, 1, 8, 8, 15, 10, 1, 9, 2, 2, 2, 2, 3, 6, 17, 7,
5, 5, 6, 12, 1, 8, 3, 1, 11, 4, 7, 8, 15, 6, 11, 9, 9,
13, 2, 3, 5, 3, 5, 12, 4, 4, 8, 7, 12, 2, 2, 4, 4, 12,
8, 11, 10, 6, 5, 1, 4, 2, 7, 3, 5, 15, 12, 12, 2, 9, 7,
4, 4, 5, 15, 5, 8, 13, 7, 2, 8, 12, 2, 13, 6, 24, 14, 3,
4, 1, 2, 8, 7, 5, 12, 8, 2, 6, 3, 7, 5, 2, 7, 3, 3,
1, 9, 9, 3, 12, 3, 2, 11, 11, 6, 3, 9, 12, 4, 8, 7, 5,
2, 10, 19, 1, 1, 10, 6, 2, 4, 2, 4, 4, 3, 7, 13, 9, 6,
2, 2, 2, 5, 13, 12, 2, 13, 12, 11, 10, 5, 8, 8, 15, 12, 3,
3, 9, 4, 6, 13, 15, 4, 7, 1, 12, 10, 9, 7, 3, 7, 4, 9,
2, 10, 2, 11, 10, 14, 3, 13, 8, 3, 12, 11, 10, 7, 5, 3, 3,
11, 3, 13, 9, 10, 20, 7, 12, 3, 6, 6, 18, 3, 10, 11, 10, 5,
6, 11, 4, 6, 7, 9, 13, 1, 14, 14, 13, 4, 3, 8, 5, 7, 14,
13, 13, 12, 8, 11, 12, 9, 8, 9, 4, 5, 4, 7, 5, 2, 3, 1,
7, 2, 1, 13, 5, 19, 9, 6, 9, 7])
当我绘制因变量的直方图时,我得到了这个
所以我用对数变换来消除一些偏斜
作为y=np.log(df["reservartions"].values)
现在分布图如下:
一些功能
type actual_price recommended_price num_videos image_ava text_length
1 67.85 59 5 0 7
0 100.70 53 5 0 224
0 74.00 74 4 1 21
0 135.00 75 1 0 184
0 59.36 53 2 1 31
由于实际价格和建议价格有很大的相关性,我创建了这两个的差价,并降低了实际价格和建议价格
但在运行线性回归或随机森林回归后,我得到了非常差的结果,R2为0.12
这表明该模型显然没有很好的预测和拟合
我的因变量显然是正变量。线性回归仍然正确吗?我应该使用泊松回归吗?日志转换有意义吗
目前没有回答
相关问题 更多 >
编程相关推荐