机器学习Weka如何使用Java代码预测新的不可见实例？

1 周，5 日 Questions & Answers 3900

我编写了一个WEKA java代码来训练4个分类器。我保存了分类器模型，并想用它们来预测新的看不见的实例（把它想象成一个想要测试推特是正面的还是负面的人）

我对训练数据使用了StringToWordsVector过滤器。为了避免出现“Src和Dest在#of attributes中不同”的错误，我使用下面的代码使用经过训练的数据来训练过滤器，然后在新实例上应用过滤器来尝试预测新实例是正的还是负的。我就是做不好

Classifier cls = (Classifier) weka.core.SerializationHelper.read("models/myModel.model"); //reading one of the trained classifiers


    BufferedReader datafile = readDataFile("Tweets/tone1.ARFF"); //read training data

    Instances data = new Instances(datafile);
    data.setClassIndex(data.numAttributes() - 1);

    Filter filter = new StringToWordVector(50);//keep 50 words
    filter.setInputFormat(data);
    Instances filteredData = Filter.useFilter(data, filter);

    // rebuild classifier
    cls.buildClassifier(filteredData);


    String testInstance= "Text that I want to use as an unseen instance and predict whether it's positive or negative";
    System.out.println(">create test instance"); 
    FastVector attributes = new FastVector(2); 
    attributes.addElement(new Attribute("text", (FastVector) null)); 


    // Add class attribute. 
    FastVector classValues = new FastVector(2); 
    classValues.addElement("Negative"); 
    classValues.addElement("Positive"); 

    attributes.addElement(new Attribute("Tone", classValues)); 
    // Create dataset with initial capacity of 100, and set index of class. 
    Instances tests = new Instances("test istance", attributes, 100); 
    tests.setClassIndex(tests.numAttributes() - 1); 

    Instance test = new Instance(2); 
    // Set value for message attribute 
    Attribute messageAtt = tests.attribute("text"); 
    test.setValue(messageAtt, messageAtt.addStringValue(testInstance)); 

    test.setDataset(tests); 

    Filter filter2 = new StringToWordVector(50);
    filter2.setInputFormat(tests);
    Instances filteredTests = Filter.useFilter(tests, filter2);

    System.out.println(">train Test filter using training data"); 
    Standardize sfilter = new Standardize(); //Match the number of attributes between src and dest.
    sfilter.setInputFormat(filteredData);  // initializing the filter with training set 
    filteredTests = Filter.useFilter(filteredData, sfilter);    // create new test set

ArffSaver saver = new ArffSaver(); //save test data to ARFF file
saver.setInstances(filteredTests); 
        File unseenFile = new File ("Tweets/unseen.ARFF");
        saver.setFile(unseenFile); 
        saver.writeBatch();

当我尝试使用过滤后的训练数据标准化输入数据时，我得到了一个新的ARFF文件（unseen.ARFF），但有2000个（相同数量的训练数据）实例，其中大多数值为负值。我不明白为什么或者如何删除这些实例

    System.out.println(">Evaluation"); //without the following 2 lines I get ArrayIndexOutOfBoundException.
    filteredData.setClassIndex(filteredData.numAttributes() - 1);
    filteredTests.setClassIndex(filteredTests.numAttributes() - 1);

    Evaluation eval = new Evaluation(filteredData); 
    eval.evaluateModel(cls, filteredTests); 
    System.out.println(eval.toSummaryString("\nResults\n======\n", false));

打印我想要看到的评估结果，例如，这个实例的正面或负面程度的百分比，但是我得到以下结果。我还希望看到1个实例，而不是2000个。任何关于如何做到这一点的帮助都将是巨大的

> Results
======

Correlation coefficient                  0.0285
Mean absolute error                      0.8765
Root mean squared error                  1.2185
Relative absolute error                409.4123 %
Root relative squared error            121.8754 %
Total Number of Instances             2000

谢谢

/** * This method performs classification of unseen instance. * It starts by training a model using a selection of classifiers then classifiy new unlabled instances. */ public static void predict() throws Exception { //start by providing the paths for your training and testing ARFF files make sure both files have the same structure and the exact classes in the header //initialise classifier Classifier classifier = null; System.out.println("read training arff"); Instances train = new Instances(new BufferedReader(new FileReader("Train.arff"))); train.setClassIndex(0);//in my case the class was the first attribute thus zero otherwise it's the number of attributes -1 System.out.println("read testing arff"); Instances unlabeled = new Instances(new BufferedReader(new FileReader("Test.arff"))); unlabeled.setClassIndex(0); // training using a collection of classifiers (NaiveBayes, SMO (AKA SVM), KNN and Decision trees.) String[] algorithms = {"nb","smo","knn","j48"}; for(int w=0; w<algorithms.length;w++){ if(algorithms[w].equals("nb")) classifier = new NaiveBayes(); if(algorithms[w].equals("smo")) classifier = new SMO(); if(algorithms[w].equals("knn")) classifier = new IBk(); if(algorithms[w].equals("j48")) classifier = new J48(); System.out.println("=========================================================================="); System.out.println("training using " + algorithms[w] + " classifier"); Evaluation eval = new Evaluation(train); //perform 10 fold cross validation eval.crossValidateModel(classifier, train, 10, new Random(1)); String output = eval.toSummaryString(); System.out.println(output); String classDetails = eval.toClassDetailsString(); System.out.println(classDetails); classifier.buildClassifier(train); } Instances labeled = new Instances(unlabeled); // label instances (use the trained classifier to classify new unseen instances) for (int i = 0; i < unlabeled.numInstances(); i++) { double clsLabel = classifier.classifyInstance(unlabeled.instance(i)); labeled.instance(i).setClassValue(clsLabel); System.out.println(clsLabel + " -> " + unlabeled.classAttribute().value((int) clsLabel)); } //save the model for future use ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream("myModel.dat")); out.writeObject(classifier); out.close(); System.out.println("===== Saved model ====="); }

Python中文网

有 Java 编程相关的问题?

机器学习Weka如何使用Java代码预测新的不可见实例？

共 (3) 个答案

# 1 楼答案

# 2 楼答案

# 3 楼答案