有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

用Weka训练分类器的java速度太慢

我正在用Weka构建一个分类器,我的数据集是稀疏的(文本数据)。我需要自己构建特征向量,而无需使用Weka的实用类将文本文档转换为特征向量。问题是训练任何分类器都非常慢,尽管特征和样本的数量很少

我用人工稀疏特征向量样本编写了一个测试用例,向您展示它有多慢。你可以运行它

public static void test() throws Exception {
        System.out.println( "Started test ... " + new Date() );

        Classifier clf = new SimpleLogistic();
        int numberOfFeatures = 2000;
        int numberOfSamples = 6000;
        Random rnd = new Random(0);

        //Define dataset
        FastVector attributes = new FastVector(numberOfFeatures + 1);
        for (Integer i = 0; i < numberOfFeatures; i++) {
            attributes.addElement( new Attribute(i.toString()) );
        }

        FastVector classes = new FastVector( 2 );
        classes.addElement( "Positive" );
        classes.addElement( "Negative" );

        attributes.addElement( new Attribute( "class", classes ) );
        Instances data = new Instances("", attributes, 100);
        data.setClassIndex(data.numAttributes()-1);

        //Create artifical sparse feature vectors for the positive class
        for ( int i = 0; i < numberOfSamples/2; i++ ) {
            double[] vec = new double[numberOfFeatures + 1];
            vec[rnd.nextInt(numberOfFeatures)] = 1;
            vec[rnd.nextInt(numberOfFeatures)] = 1;
            vec[rnd.nextInt(numberOfFeatures)] = 1;
            vec[rnd.nextInt(numberOfFeatures)] = 1;

            Instance instance = new Instance(1.0, vec);
            instance.setDataset(data);
            Instance sparseInstance = new SparseInstance(instance);
            sparseInstance.setDataset(data);
            sparseInstance.setClassValue("Positive");
            data.add(sparseInstance);
        }

        //Create artifical sparse feature vectors for the negative class
        for ( int i = 0; i < numberOfSamples/2; i++ ) {
            double[] vec = new double[numberOfFeatures + 1];
            vec[rnd.nextInt(numberOfFeatures)] = 1;
            vec[rnd.nextInt(numberOfFeatures)] = 1;
            vec[rnd.nextInt(numberOfFeatures)] = 1;
            vec[rnd.nextInt(numberOfFeatures)] = 1;

            Instance instance = new Instance(1.0, vec);
            instance.setDataset(data);
            Instance sparseInstance = new SparseInstance(instance);
            sparseInstance.setDataset(data);
            sparseInstance.setClassValue("Negative");
            data.add(sparseInstance);
        }
        System.out.println( "Building classifier ... " );
        clf.buildClassifier(data);
        System.out.println( new Date() );
    }

我不确定是否有什么我应该做的,以使它更快!这对我来说毫无意义,因为梯度下降应该运行得很快。我尝试了一个MultilayerPerceptron分类器,它有一个隐藏层、一个隐藏单元和一个纪元,但速度非常慢

编辑:

我尝试了测试用例的相同想法,但是使用了scikit-learn,速度非常快!在这里:

import numpy as np
import random
from sklearn import linear_model

numberOfFeatures = 2000;
numberOfSamples = 6000;

X = np.zeros( (numberOfSamples, numberOfFeatures) )
y = np.zeros(numberOfSamples)

for i in xrange( numberOfSamples ):
    X[i][ random.randint(0, numberOfFeatures - 1) ] = 1;
    X[i][ random.randint(0, numberOfFeatures - 1) ] = 1;
    X[i][ random.randint(0, numberOfFeatures - 1) ] = 1;
    X[i][ random.randint(0, numberOfFeatures - 1) ] = 1;
    X[i][ random.randint(0, numberOfFeatures - 1) ] = 1;
    X[i][ random.randint(0, numberOfFeatures - 1) ] = 1;
    X[i][ random.randint(0, numberOfFeatures - 1) ] = 1;
    X[i][ random.randint(0, numberOfFeatures - 1) ] = 1;

for i in xrange( 100 ):
    y[i] = 1


clf = linear_model.LogisticRegression()
print 'fitting'
clf.fit(X, y)

print 'done!'

共 (0) 个答案