我无法在Java中对给定的双精度值进行bucketize

1 月 Questions & Answers 57

几天来，我一直在尝试用Java编写Bayes分类器。为此，我下载了Iris数据集（https://www.kaggle.com/uciml/iris）。我编写了加载数据集的代码（我将.csv文件转换为.txt）。它很好用。然后因为它是一个贝叶斯分类器，我需要bucketized数据。我想出了一个逻辑，实现了它，它工作得很好。我对代码进行了必要的更改。但是当我再次运行代码时，它只打印所有值的-1。当我再次测试我的扣合逻辑时，它运行良好！请帮忙。我的bucketing逻辑返回给定值所在的bucket的编号

用于加载数据集的代码-

public class BayesianClassifier 
{
    //------------------working variables---------------------

    private static OrderedMap<int [], String> dataset = loadDataset(); //method for loading the dataset. refer below.
    private static int maxBuckets = 5;     //these value are
    private static double bucketSize = 1.0;//just for testing
    private static double min = 4.0;       //whether my bucketing logic
    private static double max = 7.0;       //works fine on the data

    //------------------working methods-----------------------

    private static OrderedMap<int [], String> loadDataset() {
        OrderedMap<int [], String> dataset = new OrderedMap<int [], String>(); //I needed a map structure which stores entries in the same
                                                                           // order as they are put. I coded it myself because there are
                                                                           // no classes like that in Java.

        try {
            BufferedReader reader = new BufferedReader(new FileReader("F:\\File Transport Directory\\Bayesian\\Iris - Copy.txt"));

            String line = reader.readLine(); //I skip the first line in the file because I do not need it.
            String [] fullRow = line.split("\t"); //for initializing the size of a record.
            int [] neededRow = new int[fullRow.length - 2]; //I do not need the first value (ID) and the last value(category).
                                                        // I will store category as the value in my OrderedMap.

            while((line = reader.readLine()) != null) {
                fullRow = line.split("\t");

                for(int i = 1 ; i < fullRow.length - 1 ; i++) {
                    double value = Double.parseDouble(fullRow[i]); //parse the value as double because it is in String format
                    neededRow[i - 1] = mapToBucket(value, bucketSize, min, max, maxBuckets); //bucketize
                }

                String category = fullRow[fullRow.length - 1]; //category name to be stored
                Entry<int [], String> toPut = new Entry<int [], String>(neededRow, category); //I also coded an Entry class because
                                                                                          // in Java it is an interface.

                dataset.put(toPut);
            }
            reader.close();

        } catch (Exception e) {
            e.printStackTrace();
        }
        return dataset;
    }

    private static void printDataset() {
        Entry<int [], String> en;

        for(int i = 0 ; i < dataset.size() ; i++) {
            en = dataset.entryAt(i);
            Arr.printArrLine(en.key());
            System.out.println("class : " + en.value() + "\n");
        }
    }

    //------------------main-----------------------------------

    public static void main(String [] args) {
        printDataset(); //I print the dataset
    }
}

扣件编码-

private static int mapToBucket(double value, double bucketSize, double min, double max, int maxBuckets) {
    int noLimitBucket = (int) ((int)Math.ceil(value / bucketSize) - min); //The logic I came up with for bucketing

    if(value < min) return 0;
    else if(value > max) return maxBuckets - 1;
    else if(noLimitBucket > maxBuckets - 1) return maxBuckets - 1;
    else if(noLimitBucket < 0) return 0;
    else return noLimitBucket;
}

OrderedMap类-

public class OrderedMap<K, V>
{
    private Vector<Entry<K, V>> dataset = new Vector<Entry<K, V>>();
    private HashSet<Integer> keyHashCodes = new HashSet<Integer>();
    private HashSet<Integer> valueHashCodes = new HashSet<Integer>();

    public void put(Entry<K, V> toPut) {
        if(containsKey(toPut.key())) {
            keyHashCodes.remove(toPut.hashCode());

            for(int i = 0 ; i < dataset.size() ; i++) {
                if(dataset.elementAt(i).equals(toPut)) {
                    dataset.remove(i);
                    break;
                }
            }
        }
        dataset.add(toPut);
        keyHashCodes.add(toPut.key().hashCode());
        valueHashCodes.add(toPut.value().hashCode());
    }

    public V remove(K key) {
        V value = null;

        if(containsKey(key)) {
            for(int i = 0 ; i < dataset.size() ; i++) {
                if(dataset.elementAt(i).key().equals(key)) {
                    value = dataset.elementAt(i).value();
                    dataset.remove(i);
                    break;
                }
            }
        }
        return value;
    }

    public Entry<K, V> entryAt(int index) {
        return dataset.elementAt(index);
    }

    public boolean containsKey(K key) {
        if(keyHashCodes.contains(key.hashCode())) return true;
        else return false;
    }

    public boolean containsValue(V value) {
        if(valueHashCodes.contains(value.hashCode())) return true;
        else return false;
    }

    public int size() {
        return dataset.size();
    }
}

入门级-

public class Entry<K, V> 
{
    private K key;
    private V value;

    Entry(K k, V v) {
        key = k;
        value = v;
    }

    public K key() {
        return key;
    }

    public V value() {
        return value;
    }
}

Python中文网

有 Java 编程相关的问题?

我无法在Java中对给定的双精度值进行bucketize

共 (0) 个答案