我无法在Java中对给定的双精度值进行bucketize
几天来,我一直在尝试用Java编写Bayes分类器。为此,我下载了Iris数据集(https://www.kaggle.com/uciml/iris)。我编写了加载数据集的代码(我将.csv文件转换为.txt)。它很好用。然后因为它是一个贝叶斯分类器,我需要bucketized数据。我想出了一个逻辑,实现了它,它工作得很好。我对代码进行了必要的更改。但是当我再次运行代码时,它只打印所有值的-1。当我再次测试我的扣合逻辑时,它运行良好!请帮忙。我的bucketing逻辑返回给定值所在的bucket的编号
用于加载数据集的代码-
public class BayesianClassifier
{
//------------------working variables---------------------
private static OrderedMap<int [], String> dataset = loadDataset(); //method for loading the dataset. refer below.
private static int maxBuckets = 5; //these value are
private static double bucketSize = 1.0;//just for testing
private static double min = 4.0; //whether my bucketing logic
private static double max = 7.0; //works fine on the data
//------------------working methods-----------------------
private static OrderedMap<int [], String> loadDataset() {
OrderedMap<int [], String> dataset = new OrderedMap<int [], String>(); //I needed a map structure which stores entries in the same
// order as they are put. I coded it myself because there are
// no classes like that in Java.
try {
BufferedReader reader = new BufferedReader(new FileReader("F:\\File Transport Directory\\Bayesian\\Iris - Copy.txt"));
String line = reader.readLine(); //I skip the first line in the file because I do not need it.
String [] fullRow = line.split("\t"); //for initializing the size of a record.
int [] neededRow = new int[fullRow.length - 2]; //I do not need the first value (ID) and the last value(category).
// I will store category as the value in my OrderedMap.
while((line = reader.readLine()) != null) {
fullRow = line.split("\t");
for(int i = 1 ; i < fullRow.length - 1 ; i++) {
double value = Double.parseDouble(fullRow[i]); //parse the value as double because it is in String format
neededRow[i - 1] = mapToBucket(value, bucketSize, min, max, maxBuckets); //bucketize
}
String category = fullRow[fullRow.length - 1]; //category name to be stored
Entry<int [], String> toPut = new Entry<int [], String>(neededRow, category); //I also coded an Entry class because
// in Java it is an interface.
dataset.put(toPut);
}
reader.close();
} catch (Exception e) {
e.printStackTrace();
}
return dataset;
}
private static void printDataset() {
Entry<int [], String> en;
for(int i = 0 ; i < dataset.size() ; i++) {
en = dataset.entryAt(i);
Arr.printArrLine(en.key());
System.out.println("class : " + en.value() + "\n");
}
}
//------------------main-----------------------------------
public static void main(String [] args) {
printDataset(); //I print the dataset
}
}
扣件编码-
private static int mapToBucket(double value, double bucketSize, double min, double max, int maxBuckets) {
int noLimitBucket = (int) ((int)Math.ceil(value / bucketSize) - min); //The logic I came up with for bucketing
if(value < min) return 0;
else if(value > max) return maxBuckets - 1;
else if(noLimitBucket > maxBuckets - 1) return maxBuckets - 1;
else if(noLimitBucket < 0) return 0;
else return noLimitBucket;
}
OrderedMap类-
public class OrderedMap<K, V>
{
private Vector<Entry<K, V>> dataset = new Vector<Entry<K, V>>();
private HashSet<Integer> keyHashCodes = new HashSet<Integer>();
private HashSet<Integer> valueHashCodes = new HashSet<Integer>();
public void put(Entry<K, V> toPut) {
if(containsKey(toPut.key())) {
keyHashCodes.remove(toPut.hashCode());
for(int i = 0 ; i < dataset.size() ; i++) {
if(dataset.elementAt(i).equals(toPut)) {
dataset.remove(i);
break;
}
}
}
dataset.add(toPut);
keyHashCodes.add(toPut.key().hashCode());
valueHashCodes.add(toPut.value().hashCode());
}
public V remove(K key) {
V value = null;
if(containsKey(key)) {
for(int i = 0 ; i < dataset.size() ; i++) {
if(dataset.elementAt(i).key().equals(key)) {
value = dataset.elementAt(i).value();
dataset.remove(i);
break;
}
}
}
return value;
}
public Entry<K, V> entryAt(int index) {
return dataset.elementAt(index);
}
public boolean containsKey(K key) {
if(keyHashCodes.contains(key.hashCode())) return true;
else return false;
}
public boolean containsValue(V value) {
if(valueHashCodes.contains(value.hashCode())) return true;
else return false;
}
public int size() {
return dataset.size();
}
}
入门级-
public class Entry<K, V>
{
private K key;
private V value;
Entry(K k, V v) {
key = k;
value = v;
}
public K key() {
return key;
}
public V value() {
return value;
}
}
共 (0) 个答案