使用Xmx32m java命令计算3个文件中的最大字频率

9 月，4 周 Questions & Answers 42

我需要开发一个应用程序，检索3个txt文件中出现的最大数量的单词，注意所有单词在这些文件中按升序排序，一些单词可以出现在多个文件中。要运行该应用程序，我们将使用命令-Xmx32m（最大堆大小为32MB），如下所示：

java-Xmx32m最大的错误发生率

我已经使用hashmap和链表完成了应用程序，但只要我尝试使用命令-Xmx32m，应用程序就会出现内存不足错误。如果有人能帮上忙，我会很感激的

你可以在下面找到代码

package topkwordsocurrences;

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.PrintWriter;
import java.util.*;

public class TopKWordOcurrences {

    public static void getKWordOcurrences() {
        Scanner firstReader, secondReader, thirdReader;
        Map<String, Integer> uniqueWords = new LinkedHashMap<>();
        String line;
        try {
            PrintWriter out = new PrintWriter(".\\src\\topkwordsocurrences\\out.txt");
            firstReader = new Scanner(new FileReader(".\\src\\topkwordsocurrences\\f1.txt"));
            secondReader = new Scanner(new FileReader(".\\src\\topkwordsocurrences\\f2.txt"));
            thirdReader = new Scanner(new FileReader(".\\src\\topkwordsocurrences\\f3.txt"));

            while (firstReader.hasNextLine()) {
                line = firstReader.nextLine();
                
                if (uniqueWords.containsKey(line)) {
                    // if your keys is already in map, increment count of it
                    uniqueWords.put(line, uniqueWords.getOrDefault(line, 0) + 1);
                } else {
                    // if it isn't in it, add it
                    uniqueWords.put(line, 1);
                }
               
            }


            while (secondReader.hasNextLine()) {
                line = secondReader.nextLine();
                
                if (uniqueWords.containsKey(line)) {
                    // if your keys is already in map, increment count of it
                    uniqueWords.put(line, uniqueWords.getOrDefault(line, 0) + 1);
                } else {
                    // if it isn't in it, add it
                    uniqueWords.put(line, 1);
                }
        }

            while (thirdReader.hasNextLine()) {
                line = thirdReader.nextLine();
                if (uniqueWords.containsKey(line)) {
                    // if your keys is already in map, increment count of it
                    uniqueWords.put(line, uniqueWords.getOrDefault(line, 0) + 1);
                } else {
                    // if it isn't in it, add it
                    uniqueWords.put(line, 1);
                }
            }

            //convert HashMap into List
            List<Map.Entry<String, Integer>> list = new ArrayList<>(uniqueWords.entrySet());
            //sorting the list elements in descending order // to make ascending order alter return o1.getValue().compareTo(o2.getValue());
            Collections.sort(list, new Comparator<Map.Entry<String, Integer>>() {
                public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) {
                    return o2.getValue().compareTo(o1.getValue());
                }
            });


            List<Map.Entry<String, Integer>> top5 = list.subList(0, 5);

            for (Map.Entry<String, Integer> i : top5) {
                System.out.println(i.getKey());
            }
            
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {
        getKWordOcurrences();
    }

}

String[] fileNames = new String[]{ ".\\src\\topkwordsocurrences\\f1.txt", ".\\src\\topkwordsocurrences\\f2.txt", ".\\src\\topkwordsocurrences\\f3.txt" }; for (String fileName : fileNames) { try (Scanner s = new Scanner(new FileReader(fileName))) { while (s.hasNextLine()) { processWord(s.nextLine()); } } // file is closed when this line is reached, even if exceptions thrown }

open all scanners for each scanner, count all occurrences of its 1st word until all scanners are empty: sort last-read-words from scanners merge counts for the lexicographically smallest word (if found in several scanners) check to see if smallest word should enter into top-k list for each scanner which had smallest word, count occurrences for its next word

共 (1) 个答案

# 1 楼答案
有几种方法可以减少程序的内存消耗

首先，您可以在读取每个文件后将其关闭。您当前打开了3个读卡器，在程序结束前不会关闭它们。你可以做得更好：
```
String[] fileNames = new String[]{
        ".\\src\\topkwordsocurrences\\f1.txt",
        ".\\src\\topkwordsocurrences\\f2.txt",
        ".\\src\\topkwordsocurrences\\f3.txt"
};
 
for (String fileName : fileNames) {
    try (Scanner s = new Scanner(new FileReader(fileName))) {
        while (s.hasNextLine()) {
            processWord(s.nextLine());
        }
    } // file is closed when this line is reached, even if exceptions thrown
}
```
请注意，上述方法还消除了大量重复：只需实现processWord或一次打开文件的逻辑

使用LinkedHashmap存储单词。您的文件中有多少唯一的单词？如果它是普通文本，那么在低内存限制内应该没有问题。然而，如果单词是机器生成的，那么没有任何地图选项能够将它们全部存储在内存中。在这两者之间，其他类型的映射实现可能需要更少的内存—例如，一个HashMap虽然不保留顺序，但每个条目所需的空间比一个LinkedHashmap要少一些。使用Scanner也会有开销（并且它会以较小的内存成本执行预读以提高性能）。如果急需内存，直接使用阅读器可能会奏效

如果您有很多不同的单词，并且尽管进行了这些更改，但内存不足，您可以完全避免使用hashmap：您只查找最频繁的单词或K个最频繁的单词，因此只存储前K个单词及其频率

在伪代码中：
```
open all scanners
for each scanner, 
   count all occurrences of its 1st word
until all scanners are empty:
   sort last-read-words from scanners
   merge counts for the lexicographically smallest word (if found in several scanners)
   check to see if smallest word should enter into top-k list
   for each scanner which had smallest word,
      count occurrences for its next word

  
```

Python中文网

有 Java 编程相关的问题?

使用Xmx32m java命令计算3个文件中的最大字频率

共 (1) 个答案

# 1 楼答案