正确的回归模型

2024-09-30 16:29:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从一个几乎没有特征的数据集预测预订数量。特征既可以是绝对的,也可以是连续的

因变量如下所示:我的数据集大小约为917obs

array([ 1,  7, 17,  2,  2, 13,  8, 11,  9,  4,  4,  3,  5,  2,  5,  7,  3,
       12,  9, 13,  5,  2, 11, 13, 14, 19,  9, 11,  3,  6,  7, 10,  1,  6,
        5, 10,  8,  5,  4,  3,  2, 10, 10, 10,  8, 13, 16,  6,  4,  6,  3,
       11, 10,  1, 18,  7,  2, 12, 17,  4,  2, 19,  3,  4, 17, 13, 10,  2,
       10,  1,  3,  4, 20,  3,  2,  1,  3,  5,  8,  8,  4,  3, 13,  3,  3,
        5,  4, 17,  7,  6, 10,  5,  3,  9,  9,  8,  1,  5, 17,  5, 10,  9,
        2,  7, 13,  2,  9,  1, 15, 13, 10,  4,  2,  4,  5,  4,  3,  3, 10,
        4,  7,  5, 13, 12,  7,  5,  6,  9,  5, 11,  7,  1,  4, 12,  4,  3,
       11,  1,  4,  4,  3,  7,  4, 11,  4,  1,  9,  2, 10, 10,  3,  4,  4,
        3,  2,  7, 10,  7,  6,  1,  3, 19,  9,  3,  8, 20,  1, 12,  9, 13,
       13,  2,  9,  4,  9,  2,  5,  6, 18,  3,  6,  8,  6,  4,  5, 13,  4,
        8,  9,  5,  4,  8,  5,  2,  1,  6,  8,  3,  6,  4,  2,  6, 11,  5,
        1,  5,  1,  5, 11, 11,  9,  3, 12,  2,  2,  9, 19,  7, 13, 13,  9,
        2,  1,  1,  4,  3,  4,  9,  1, 25, 12,  8,  5, 18,  3,  1,  6, 17,
        7,  4,  6,  9,  8, 10,  3,  8, 12,  5,  4,  4,  1,  9, 21,  4,  3,
        3,  7, 13,  5, 12,  8,  8,  6,  3,  6,  7,  5,  3,  7,  3, 14,  3,
        5,  2, 14, 16,  3,  8,  6, 13,  9,  3,  5,  4,  9,  4, 12, 12,  4,
        9,  8, 11,  5, 13,  3,  2,  5,  4,  2,  1,  8,  8, 18, 11,  2,  5,
       13,  4,  1,  2,  4,  1,  2,  2, 12,  2,  6, 19,  7, 20,  2, 10,  2,
        9, 12,  9,  8,  1,  4,  8,  8, 12,  4,  8,  1,  3,  6,  9,  4,  3,
        8,  2,  7, 15,  6,  5, 10,  6,  4,  3, 12,  5,  4, 13,  7,  2,  8,
        5,  2,  4,  3, 14, 12,  3,  4,  3,  2, 15,  6, 14, 12, 11,  9,  5,
        5,  7, 11, 10,  7,  9,  9,  7, 11,  5, 11,  3,  2,  5, 17,  5,  2,
        6,  1, 10,  3, 13, 19,  5,  1,  3,  5,  3,  5,  6,  3,  9,  8,  2,
        3,  2,  3,  7,  4,  9,  5,  1,  6, 14,  4,  8, 17, 13,  7,  1,  4,
        5, 10,  5,  6,  2, 12,  5,  9,  3,  9,  9,  1,  5,  1,  2,  2,  5,
        1,  4,  4, 13,  4, 25,  9, 10,  4,  3,  9, 13, 13,  2,  9,  2, 12,
        4,  1, 20,  9, 10,  2,  5,  4, 10,  2,  6,  1,  7,  7,  7,  4,  8,
        4,  3,  4, 13,  8,  3, 13, 12, 19,  9,  3,  2,  6,  7, 13,  8, 16,
        7,  3, 11,  4, 10,  9, 12,  2,  8,  5,  2,  3,  4,  2,  1, 11,  5,
        4,  2,  8, 12,  7,  5,  7,  7,  4,  6, 18,  2,  1,  6, 15, 11,  2,
        5,  8,  3,  5,  9, 11,  5,  8,  6, 20,  1, 10,  3,  7,  1,  3,  5,
        4,  4, 10, 11,  6,  1,  5,  4,  1,  2, 10,  4,  4, 11, 20,  5,  3,
        2,  7,  8,  2, 10,  5,  1, 18,  5, 10,  5,  3,  8, 15,  2,  1, 14,
       10,  7,  3,  5,  9,  3,  4, 21, 14,  1,  2,  1,  2,  4, 11,  9,  7,
        6,  9, 18,  4,  6, 18, 12, 12,  4,  6,  3,  3,  9,  5, 12, 15,  3,
        7,  3,  7,  4,  2, 15, 14,  7, 10,  5,  5,  5,  9,  3,  6,  3,  1,
       11,  1,  5, 25,  8,  2, 24,  1, 12,  1,  6,  8,  5, 13,  4,  3,  3,
       13,  4,  4, 18,  7, 13,  2,  8,  3,  4,  9,  2, 13, 12,  4,  5, 10,
        9, 15,  1,  8,  8, 15, 10,  1,  9,  2,  2,  2,  2,  3,  6, 17,  7,
        5,  5,  6, 12,  1,  8,  3,  1, 11,  4,  7,  8, 15,  6, 11,  9,  9,
       13,  2,  3,  5,  3,  5, 12,  4,  4,  8,  7, 12,  2,  2,  4,  4, 12,
        8, 11, 10,  6,  5,  1,  4,  2,  7,  3,  5, 15, 12, 12,  2,  9,  7,
        4,  4,  5, 15,  5,  8, 13,  7,  2,  8, 12,  2, 13,  6, 24, 14,  3,
        4,  1,  2,  8,  7,  5, 12,  8,  2,  6,  3,  7,  5,  2,  7,  3,  3,
        1,  9,  9,  3, 12,  3,  2, 11, 11,  6,  3,  9, 12,  4,  8,  7,  5,
        2, 10, 19,  1,  1, 10,  6,  2,  4,  2,  4,  4,  3,  7, 13,  9,  6,
        2,  2,  2,  5, 13, 12,  2, 13, 12, 11, 10,  5,  8,  8, 15, 12,  3,
        3,  9,  4,  6, 13, 15,  4,  7,  1, 12, 10,  9,  7,  3,  7,  4,  9,
        2, 10,  2, 11, 10, 14,  3, 13,  8,  3, 12, 11, 10,  7,  5,  3,  3,
       11,  3, 13,  9, 10, 20,  7, 12,  3,  6,  6, 18,  3, 10, 11, 10,  5,
        6, 11,  4,  6,  7,  9, 13,  1, 14, 14, 13,  4,  3,  8,  5,  7, 14,
       13, 13, 12,  8, 11, 12,  9,  8,  9,  4,  5,  4,  7,  5,  2,  3,  1,
        7,  2,  1, 13,  5, 19,  9,  6,  9,  7])

当我绘制因变量的直方图时,我得到了这个

enter image description here

所以我用对数变换来消除一些偏斜

作为y=np.log(df["reservartions"].values)

现在分布图如下:

enter image description here

一些功能

type    actual_price    recommended_price   num_videos  image_ava   text_length
1   67.85   59  5   0   7
0   100.70  53  5   0   224
0   74.00   74  4   1   21
0   135.00  75  1   0   184
0   59.36   53  2   1   31

由于实际价格和建议价格有很大的相关性,我创建了这两个的差价,并降低了实际价格和建议价格

但在运行线性回归或随机森林回归后,我得到了非常差的结果,R2为0.12

这表明该模型显然没有很好的预测和拟合

我的因变量显然是正变量。线性回归仍然正确吗?我应该使用泊松回归吗?日志转换有意义吗


Tags: 数据logdf数量np绘制线性价格