Python 3从txt fi读取数据集

2024-05-21 00:26:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样的数据集:

43907 120 101

11,31,65,67 0:0.380880 1:0.494080 2:0.540010 3:0.422930 4:0.158320 5:0.326980 6:0.390860 7:0.527120 8:0.254050 9:0.223730 10:0.040290 11:0.141130 12:0.112250 13:0.263170 14:0.147020 15:0.472410 16:0.592610 17:0.653140 18:0.499870 19:0.196520 20:0.403890 21:0.482400 22:0.619220 23:0.320350 24:0.281250 25:0.054750 26:0.180460 27:0.139960 28:0.319930 29:0.181220 30:0.364290 31:0.407210 32:0.368930 33:0.427660 34:0.211390 35:0.364340 36:0.370710 37:0.409110 38:0.289300 39:0.243050 40:0.063120 41:0.193590 42:0.158760 43:0.316050 44:0.197410 45:0.656170 46:0.678760 47:0.650830 48:0.674640 49:0.492430 50:0.623890 51:0.610620 52:0.678220 53:0.574770 54:0.523070 55:0.206800 56:0.496290 57:0.429220 58:0.586610 59:0.471550 60:0.284480 61:0.432470 62:0.498070 63:0.408140 64:0.102710 65:0.303030 66:0.309500 67:0.444860 68:0.191730 69:0.174890 70:0.034140 71:0.153100 72:0.068320 73:0.217020 74:0.099690 75:0.409860 76:0.561920 77:0.612030 78:0.514470 79:0.146020 80:0.398810 81:0.383290 82:0.548490 83:0.282940 84:0.252710 85:0.051010 86:0.223110 87:0.098110 88:0.299670 89:0.144870 90:0.308490 91:0.358480 92:0.352080 93:0.394690 94:0.157510 95:0.339370 96:0.321560 97:0.341370 98:0.247970 99:0.206070 100:0.061000 101:0.216790 102:0.112390 103:0.273650 104:0.152740 105:0.598080 106:0.621690 107:0.607210 108:0.644020 109:0.394950 110:0.593650 111:0.551530 112:0.574390 113:0.511030 114:0.464000 115:0.202030 116:0.492340 117:0.317980 118:0.547810 119:0.393780

31,33,67 0:0.449570 1:0.460490 2:0.453470 3:0.410780 4:0.231760 5:0.402150 6:0.349590 7:0.536460 8:0.318120 9:0.301620 10:0.063840 11:0.220340 12:0.184360 13:0.309230 14:0.216980 15:0.513320 16:0.517750 17:0.529540 18:0.479400 19:0.268830 20:0.464330 21:0.411790 22:0.633740 23:0.362320 24:0.354890 25:0.078480 26:0.260790 27:0.220420 28:0.356290 29:0.253430 30:0.399230 31:0.371270 32:0.337540 33:0.399480 34:0.272790 35:0.414420 36:0.335390 37:0.414630 38:0.328620 39:0.296320 40:0.088510 41:0.264240 42:0.221650 43:0.350630 44:0.256610 45:0.662580 46:0.592860 47:0.565150 48:0.626380 49:0.560600 50:0.669770 51:0.567070 52:0.673730 53:0.566180 54:0.560820 55:0.300700 56:0.564590 57:0.507360 58:0.618470 59:0.521170 60:0.357100 61:0.435480 62:0.505530 63:0.444140 64:0.147280 65:0.368310 66:0.305340 67:0.501230 68:0.241660 69:0.233360 70:0.049390 71:0.215940 72:0.103650 73:0.271220 74:0.146740 75:0.416700 76:0.496200 77:0.586400 78:0.504660 79:0.178360 80:0.425060 81:0.366600 82:0.568510 83:0.284050 84:0.282370 85:0.063300 86:0.260140 87:0.127270 88:0.319830 89:0.179630 90:0.349800 91:0.351150 92:0.358620 93:0.409720 94:0.196110 95:0.380290 96:0.313520 97:0.378220 98:0.275040 99:0.248510 100:0.076540 101:0.266020 102:0.145370 103:0.311140 104:0.192090 105:0.618950 106:0.597790 107:0.601750 108:0.646850 109:0.414880 110:0.627460 111:0.539560 112:0.638610 113:0.496370 114:0.480990 115:0.199590 116:0.535080 117:0.323830 118:0.571490 119:0.397560

第一行表示总共有43907行数据,其中有101个可能的类和120个维度。如何在python中读取这种数据集 列车x=[] 列车=[]

trainX[0]

expected output: 0.494080, 0.540010, 0.422930 .........., 0.393780

trainY[0]

expected output: 11,31,65,67

非常感谢


Tags: 数据outputtrainyexpected列车trainx
1条回答
网友
1楼 · 发布于 2024-05-21 00:26:38
with open('file.txt') as file:
    head = file.readline()
    n, classes, dim = map(int, head.split())
    print(n, classes, dim)

    train_y = []
    train_x = []

    for line in file:
        line = line.strip()
        if line:
            data = line.split()
            labels = data[0]
            print('labels:', labels)
            train_y.append(labels)

            data = data[1:]
            data = [el.split(':')[1] for el in data]  # remove index
            data = [float(el) for el in data]  # convert to float
            print('data', len(data), data)
            train_x.append(data)

输出:

^{pr2}$

相关问题 更多 >