利用Vowpal-Wabbit进行分类得到了大量的NaN预测

2024-06-28 11:17:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我是新来的,可能是我错过了一些很明显的东西。在

我有CSV的训练数据,我把它分成80%用于培训,20%用于测试。它包含62个特性(x0-x61),总共定义了7个类(0-6)。在

x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27,x28,x29,x30,x31,x32,x33,x34,x35,x36,x37,x38,x39,x40,x41,x42,x43,x44,x45,x46,x47,x48,x49,x50,x51,x52,x53,x54,x55,x56,x57,x58,x59,x60,x61,y
190436e528,16a14a2d17,06330986ed,ca63304de0,b7584c2d52,1746600cb0,1,1,0.4376549941058605,5624b8f759,152af2cb2f,91bb549494,e33c63cf35,1178.0,cc69cbe29a,617a4ad3f9,e8a040423a,c82c3dbd33,ee3501282b,199ce7c484,5f17dedd5c,5c5025bd0a,9aba4d7f51,24.94393348850157,-0.8146595838365664,-0.7083080633874904,1.5,-0.5124221809900756,-0.7339666422629345,0.3333333333333333,14.837727864583336,11.0,0.0,24.0,0.0,0.0,1.0,29.0,0.0,3.0,11.0,4.42,0.15,0.161,0.2,1.0,1.0,1.0,1.0,1.0,0.52,0.5329999999999999,0.835,-0.5865396521883026,0.6724356815192951,0.0,0.6060606060606061,0.12121212121212124,0.21212121212121213,0.060606060606060615,0.0,33.0,3
a4c3095b75,16a14a2d17,06330986ed,ca63304de0,b7584c2d52,1746600cb0,1,1,0.4809765809625592,7e5c97705a,e071d01df5,91bb549494,e33c63cf35,5777.0,6e40247e69,617a4ad3f9,4b9480aa42,e84655292c,527b6ca8cc,dd9c9e0da2,17c99905b6,0fc56ea1f0,9aba4d7f51,31.08028213771883,-0.3717867728837517,-0.3676156090868885,1.6666666666666663,0.2713072335472944,0.013112469951855535,17.333333333333325,1713.439127604167,33.0,0.0,6.0,1.0,0.6666666666666665,8.0,108.0,1.0,4.0,86.0,1.58,0.05,2.032,2.4,0.348,0.762,0.55,0.392,0.489,0.517,1.0,0.642,0.9609909328437232,0.7909897767145201,0.020161290322580645,0.6451612903225806,0.25806451612903225,0.03629032258064517,0.04032258064516129,0.0,248.0,3
aa2f3cd34a,16a14a2d17,06330986ed,ca63304de0,b7584c2d52,1746600cb0,1,1,-2.9847503675733384,f67f142e40,c28b638881,91bb549494,e33c63cf35,-1.0,fe8fb80553,617a4ad3f9,718c61545b,c26d08129a,cac4fc8eaf,199ce7c484,60299bc448,76ba8f7080,9aba4d7f51,41.40215922501433,-0.043850620710912905,-0.043227755140810106,3.5,0.19464028583619075,-0.2926973864217809,11.333333333333332,732.8046875,106.0,0.0,14.0,0.0,0.0,1.0,21.0,2.0,3.0,14.0,7.17,0.24,0.645,0.6,0.25,0.5,0.5,0.0,0.773,0.899,0.0,0.0,-0.0818699491854678,0.6639345368601952,0.0,0.0,0.0,0.0,1.0,0.0,1.0,4
bfff7d2d9e,16a14a2d17,06330986ed,ca63304de0,a62168d626,1746600cb0,1,1,0.6542629283893542,7b1f0ca4c1,1d42d0c490,669ea3d319,b38690945d,1602.0,6e40247e69,617a4ad3f9,718c61545b,d3dc404c37,7263b01813,dd9c9e0da2,17c99905b6,2cc3e04172,9aba4d7f51,32.11392568242685,0.2843684594325347,0.23249501198439226,5.0,-0.19979368911718315,0.3743375351985674,1101.0,0.44580078125,16.0,0.16666666666666666,6.0,1.0,0.5,5.0,209.0,3.0,2.0,43.0,12.08,0.4,2.613,2.8,0.5,0.556,0.875,0.612,0.064,0.0,0.435,0.785,0.5158309700290646,-0.1150902907744278,0.05945945945945946,0.8,0.06486486486486487,0.045045045045045036,0.014414414414414416,0.016216216216216217,555.0,2

我用phraug2/csv2将CSV转换成voppal格式大众汽车公司 转换后的数据如下所示:

^{pr2}$

然后我尝试进行多类分类,一对一建立一个模型:

vw ./train_my.text -f predictor.vw --oaa 7 --passes 5 --cache_file cache 

然而,我得到了很多关于NAN的预言:

NAN prediction in example 21643, forcing 0.000000
NAN prediction in example 21643, forcing 0.000000
NAN prediction in example 21644, forcing 0.000000
NAN prediction in example 21644, forcing 0.000000
NAN prediction in example 21644, forcing 0.000000
NAN prediction in example 21644, forcing 0.000000
NAN prediction in example 21644, forcing 0.000000
NAN prediction in example 21644, forcing 0.000000
NAN prediction in example 21644, forcing 0.000000
NAN prediction in example 21705, forcing 0.000000
NAN prediction in example 21705, forcing 0.000000
NAN prediction in example 21705, forcing 0.000000
NAN prediction in example 21705, forcing 0.000000
NAN prediction in example 21705, forcing 0.000000
NAN prediction in example 21705, forcing 0.000000
NAN prediction in example 21705, forcing 0.000000
NAN prediction in example 21707, forcing 0.000000
NAN prediction in example 21707, forcing 0.000000
NAN prediction in example 21707, forcing 0.000000
NAN prediction in example 21707, forcing 0.000000
NAN prediction in example 21707, forcing 0.000000
NAN prediction in example 21707, forcing 0.000000
NAN prediction in example 21707, forcing 0.000000
NAN prediction in example 21735, forcing 0.000000
NAN prediction in example 21735, forcing 0.000000
NAN prediction in example 21735, forcing 0.000000
NAN prediction in example 21735, forcing 0.000000
NAN prediction in example 21735, forcing 0.000000
NAN prediction in example 21735, forcing 0.000000
NAN prediction in example 21735, forcing 0.000000
NAN prediction in example 21790, forcing 0.000000
NAN prediction in example 21790, forcing 0.000000
NAN prediction in example 21790, forcing 0.000000
NAN prediction in example 21790, forcing 0.000000
NAN prediction in example 21790, forcing 0.000000
NAN prediction in example 21790, forcing 0.000000
NAN prediction in example 21790, forcing 0.000000
NAN prediction in example 21794, forcing 0.000000
NAN prediction in example 21794, forcing 0.000000
NAN prediction in example 21794, forcing 0.000000
NAN prediction in example 21794, forcing 0.000000
NAN prediction in example 21794, forcing 0.000000
NAN prediction in example 21794, forcing 0.000000
NAN prediction in example 21794, forcing 0.000000
NAN prediction in example 21796, forcing 0.000000
NAN prediction in example 21796, forcing 0.000000
NAN prediction in example 21796, forcing 0.000000
NAN prediction in example 21796, forcing 0.000000
NAN prediction in example 21796, forcing 0.000000
NAN prediction in example 21796, forcing 0.000000

平均损失表明,该模型无法真正预测任何情况

number of examples per pass = 36063
passes used = 4
weighted example sum = 144252.000000
weighted label sum = 0.000000
average loss = 0.801797 h
total feature number = 7598612

我做错什么了?在


Tags: csv数据in模型examplenanvwprediction
1条回答
网友
1楼 · 发布于 2024-06-28 11:17:23

这是由于在vowpal-wabbit中使用带有权重的变量进行训练的结果。(即)x1:12334234或x1:1e-30。如果你去掉变量的权重,或者对它们进行缩放,这个问题就会消失。此外,您可能需要在变量之间调整逻辑回归的值。在

相关问题 更多 >