有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

使用Xmx32m java命令计算3个文件中的最大字频率

我需要开发一个应用程序,检索3个txt文件中出现的最大数量的单词,注意所有单词在这些文件中按升序排序,一些单词可以出现在多个文件中。要运行该应用程序,我们将使用命令-Xmx32m(最大堆大小为32MB),如下所示:

java-Xmx32m最大的错误发生率

我已经使用hashmap和链表完成了应用程序,但只要我尝试使用命令-Xmx32m,应用程序就会出现内存不足错误。如果有人能帮上忙,我会很感激的

你可以在下面找到代码

package topkwordsocurrences;

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.PrintWriter;
import java.util.*;

public class TopKWordOcurrences {

    public static void getKWordOcurrences() {
        Scanner firstReader, secondReader, thirdReader;
        Map<String, Integer> uniqueWords = new LinkedHashMap<>();
        String line;
        try {
            PrintWriter out = new PrintWriter(".\\src\\topkwordsocurrences\\out.txt");
            firstReader = new Scanner(new FileReader(".\\src\\topkwordsocurrences\\f1.txt"));
            secondReader = new Scanner(new FileReader(".\\src\\topkwordsocurrences\\f2.txt"));
            thirdReader = new Scanner(new FileReader(".\\src\\topkwordsocurrences\\f3.txt"));

            while (firstReader.hasNextLine()) {
                line = firstReader.nextLine();
                
                if (uniqueWords.containsKey(line)) {
                    // if your keys is already in map, increment count of it
                    uniqueWords.put(line, uniqueWords.getOrDefault(line, 0) + 1);
                } else {
                    // if it isn't in it, add it
                    uniqueWords.put(line, 1);
                }
               
            }


            while (secondReader.hasNextLine()) {
                line = secondReader.nextLine();
                
                if (uniqueWords.containsKey(line)) {
                    // if your keys is already in map, increment count of it
                    uniqueWords.put(line, uniqueWords.getOrDefault(line, 0) + 1);
                } else {
                    // if it isn't in it, add it
                    uniqueWords.put(line, 1);
                }
        }

            while (thirdReader.hasNextLine()) {
                line = thirdReader.nextLine();
                if (uniqueWords.containsKey(line)) {
                    // if your keys is already in map, increment count of it
                    uniqueWords.put(line, uniqueWords.getOrDefault(line, 0) + 1);
                } else {
                    // if it isn't in it, add it
                    uniqueWords.put(line, 1);
                }
            }

            //convert HashMap into List
            List<Map.Entry<String, Integer>> list = new ArrayList<>(uniqueWords.entrySet());
            //sorting the list elements in descending order // to make ascending order alter return o1.getValue().compareTo(o2.getValue());
            Collections.sort(list, new Comparator<Map.Entry<String, Integer>>() {
                public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) {
                    return o2.getValue().compareTo(o1.getValue());
                }
            });


            List<Map.Entry<String, Integer>> top5 = list.subList(0, 5);

            for (Map.Entry<String, Integer> i : top5) {
                System.out.println(i.getKey());
            }
            
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {
        getKWordOcurrences();
    }

}

共 (1) 个答案

  1. # 1 楼答案

    有几种方法可以减少程序的内存消耗

    首先,您可以在读取每个文件后将其关闭。您当前打开了3个读卡器,在程序结束前不会关闭它们。你可以做得更好:

    String[] fileNames = new String[]{
            ".\\src\\topkwordsocurrences\\f1.txt",
            ".\\src\\topkwordsocurrences\\f2.txt",
            ".\\src\\topkwordsocurrences\\f3.txt"
    };
     
    for (String fileName : fileNames) {
        try (Scanner s = new Scanner(new FileReader(fileName))) {
            while (s.hasNextLine()) {
                processWord(s.nextLine());
            }
        } // file is closed when this line is reached, even if exceptions thrown
    }
    

    请注意,上述方法还消除了大量重复:只需实现processWord或一次打开文件的逻辑

    使用LinkedHashmap存储单词。您的文件中有多少唯一的单词?如果它是普通文本,那么在低内存限制内应该没有问题。然而,如果单词是机器生成的,那么没有任何地图选项能够将它们全部存储在内存中。在这两者之间,其他类型的映射实现可能需要更少的内存—例如,一个HashMap虽然不保留顺序,但每个条目所需的空间比一个LinkedHashmap要少一些。使用Scanner也会有开销(并且它会以较小的内存成本执行预读以提高性能)。如果急需内存,直接使用阅读器可能会奏效

    如果您有很多不同的单词,并且尽管进行了这些更改,但内存不足,您可以完全避免使用hashmap:您只查找最频繁的单词或K个最频繁的单词,因此只存储前K个单词及其频率

    在伪代码中:

    open all scanners
    for each scanner, 
       count all occurrences of its 1st word
    until all scanners are empty:
       sort last-read-words from scanners
       merge counts for the lexicographically smallest word (if found in several scanners)
       check to see if smallest word should enter into top-k list
       for each scanner which had smallest word,
          count occurrences for its next word