有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

我的程序中出现java内存不足错误

我写了一个程序,可以对对象列表(最大800个)进行一些数据处理。在这份清单上所做的工作主要如下:

  1. 很多SQL查询
  2. 处理查询的数据
  3. 分组匹配
  4. 将它们写入CSV文件

所有这些都工作得很好,但是数据处理部分和SQL数据的大小与日俱增,程序开始耗尽内存并经常崩溃

为了避免这种情况,我决定将这个大列表切碎成几个较小的块,然后尝试在这些较小的列表上做同样的工作(在进入下一个小列表之前,我会清除并取消当前的小列表),希望它能解决问题。但这一点都没有帮助,程序的内存仍然不足

在for循环的第一次迭代中,程序不会耗尽内存,而是在第二次或第三次迭代中耗尽内存

我是否正确地清除和取消了for循环中的所有列表和对象,以便为下一次迭代释放内存

我如何解决这个问题?我已经把代码放在下面了

如有任何建议/解决方案,将不胜感激

提前谢谢。 干杯

List<someObject> unchoppedList = new ArrayList<someObject>();
for (String pb : listOfNames) {
    someObject tccw = null;
    tccw = new someObject(...);
    unchoppedList.add(tccw);
}
Collections.shuffle(unchoppedList);
List<List<someObject>> master = null;
if (unchoppedList.size() > 0 && unchoppedList.size() <= 175) {
    master = chopped(unchoppedList, 1);
} else if (unchoppedList.size() > 175 && unchoppedList.size() <= 355) {
    master = chopped(unchoppedList, 2);
} else if (unchoppedList.size() > 355 && unchoppedList.size() <= 535) {
    master = chopped(unchoppedList, 3);
} else if (unchoppedList.size() > 535&& unchoppedList.size() <= 800)) {
    master = chopped(unchoppedList, 4);
}

for (int i = 0 ; i < master.size() ; i++) {
    List<someObject> m = master.get(i);
    System.gc(); // I insterted this statement to force GC
    executor1 = Executors.newFixedThreadPool(Configuration.getNumberOfProcessors());
    generalList = new ArrayList<ProductBean>();
    try {
        m.parallelStream().forEach(work -> {
            try {
                generalList.addAll(executor1.submit(work).get());
                work = null;
            } catch (Exception e) {
                logError(e);
            }
        });
    } catch (Exception e) {
        logError(e);
    }
    executor1.shutdown();
    executor1.awaitTermination(30, TimeUnit.SECONDS);
    m.clear();
    m = null;
    executor1 = null;

    //once the general list is produced the program randomly matches some "good" products to highly similar "not-so-good" products
    List<ProductBean> controlList = new ArrayList<ProductBean>();
    List<ProductBean> tempKaseList = new ArrayList<ProductBean>();
    for (ProductBean kase : generalList) {
        if (kase.getGoodStatus() == 0 && kase.getBadStatus() == 1) {
            controlList.add(kase1);
        } else if (kase.getGoodStatus() == 1 && kase.getBadStatus() == 0) {
            tempKaseList.add(kase1);
        }
    }
    generalList = new ArrayList<ProductBean>(tempKaseList);
    tempKaseList.clear();
    tempKaseList = null;

    Collections.shuffle(generalList);
    Collections.shuffle(controlList);
    final List<List<ProductBean>> compliCases = chopped(generalList, 3);
    final List<List<ProductBean>> compliControls = chopped(controlList, 3);
    generalList.clear();
    controlList.clear();
    generalList = null;
    controlList = null;

    final List<ProductBean> remainingCases = Collections.synchronizedList(new ArrayList<ProductBean>());
    IntStream.range(0, compliCases.size()).parallel().forEach(i -> {
        compliCases.get(i).forEach(c -> {
            TheRandomMatchWorker tRMW = new TheRandomMatchWorker(compliControls.get(i), c);
            List<String[]> reportData = tRMW.generateReport();
            writeToCSVFile(reportData);
            // if the program cannot find required number of products to match it is added to a new list to look for matching candidates elsewhere
            if (tRMW.getTheKase().isEverythingMathced == false) {
                remainingCases.add(tRMW.getTheKase());
            }
            compliControls.get(i).removeAll(tRMW.getTheMatchedControls());
            tRMW = null;
            stuff.clear();
        });
    });

    controlList = new ArrayList<ProductBean>();
    for (List<ProductBean> c10 : compliControls) {
        controlList.addAll(c10);
    }
    compliCases.clear();
    compliControls.clear();

    //last sweep where the program for last time tries to match some "good" products to highly similar "not-so-good" products
    try {
        for (ProductBean kase : remainingCases) {
            if (kase.getNoOfContrls() < ccv.getNoofctrl()) {
                TheRandomMatchWorker tRMW = new TheRandomMatchWorker(controlList, kase );
                List<String[]> reportData = tRMW.generateReport();
                writeToCSVFile(reportData);
                if (tRMW.getTheKase().isEverythingMathced == false) {
                    remainingCases.add(tRMW.getTheKase());
                }
                compliControls.get(i).removeAll(tRMW.getTheMatchedControls());
                tRMW = null;
                stuff.clear();
            }
        }
    } catch (Exception e) {
        logError(e);
    }

    remainingCases.clear();
    controlList.clear();
    controlList = null;
    master.get(i).clear();
    master.set(i, null);
    System.gc();
}
master.clear();
master = null;

这里是切碎的方法

static <T> List<List<T>> chopped(List<T> list, final int L) {
    List<List<T>> parts = new ArrayList<List<T>>();
    final int N = list.size();
    int y = N / L, m = 0, c = y;
    int r = c * L;
    for (int i = 1; i <= L; i++) {
        if (i == L) {
            c += (N - r);
        }
        parts.add(new ArrayList<T>(list.subList(m, c)));
        m = c;
        c += y;
    }
    return parts;
}

这是请求的堆栈跟踪

java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at Controller.MasterStudyController.lambda$1(MasterStudyController.java:212)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
    at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    at java.util.concurrent.ForkJoinPool$WorkQueue.execLocalTasks(ForkJoinPool.java:1040)
    at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1058)
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at org.postgresql.core.Encoding.decode(Encoding.java:204)
    at org.postgresql.core.Encoding.decode(Encoding.java:215)
    at org.postgresql.jdbc.PgResultSet.getString(PgResultSet.java:1913)
    at org.postgresql.jdbc.PgResultSet.getString(PgResultSet.java:2484)
    at Controller.someObject.findControls(someObject.java:214)
    at Controller.someObject.call(someObject.java:81)
    at Controller.someObject.call(someObject.java:1)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
[19:13:35][ERROR] Jarvis: Exception:
java.util.concurrent.ExecutionException: java.lang.AssertionError: Failed generating bytecode for <eval>:-1
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at Controller.MasterStudyController.lambda$1(MasterStudyController.java:212)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
    at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    at java.util.concurrent.ForkJoinPool$WorkQueue.execLocalTasks(ForkJoinPool.java:1040)
    at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1058)
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Caused by: java.lang.AssertionError: Failed generating bytecode for <eval>:-1
    at jdk.nashorn.internal.codegen.CompilationPhase$BytecodeGenerationPhase.transform(CompilationPhase.java:431)
    at jdk.nashorn.internal.codegen.CompilationPhase.apply(CompilationPhase.java:624)
    at jdk.nashorn.internal.codegen.Compiler.compile(Compiler.java:655)
    at jdk.nashorn.internal.runtime.Context.compile(Context.java:1317)
    at jdk.nashorn.internal.runtime.Context.compileScript(Context.java:1251)
    at jdk.nashorn.internal.runtime.Context.compileScript(Context.java:627)
    at jdk.nashorn.api.scripting.NashornScriptEngine.compileImpl(NashornScriptEngine.java:535)
    at jdk.nashorn.api.scripting.NashornScriptEngine.compileImpl(NashornScriptEngine.java:524)
    at jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:402)
    at jdk.nashorn.api.scripting.NashornScriptEngine.eval(NashornScriptEngine.java:155)
    at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:264)
    at Controller.someObject.findCases(someObject.java:108)
    at Controller.someObject.call(someObject.java:72)
    at Controller.someObject.call(someObject.java:1)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
[19:13:52][ERROR] Jarvis: Exception:
[19:51:41][ERROR] Jarvis: Exception:
org.postgresql.util.PSQLException: Ran out of memory retrieving query results.
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2157)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:300)
    at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)
    at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)
    at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:169)
    at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:117)
    at Controller.someObject.lookForSomething(someObject.java:763)
    at Controller.someObject.call(someObject.java:70)
    at Controller.someObject.call(someObject.java:1)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

共 (1) 个答案

  1. # 1 楼答案

    好的,JVM的48GB内存是相当大的(我想你说的是堆空间,所以-Xmx48G)。我们在这里讨论的显然是大数据集,这当然会使事情复杂化,因为创建最小的可复制示例并不容易

    我要做的第一件事是更深入地了解消耗所有内存的是什么。您可以使用以下选项让Java在内存不足时生成堆转储:

    -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp
    

    这将创建一个java_uxxxxxx。当程序因OutOfMemory错误而崩溃时,hprof文件位于/tmp中

    然后,您可以尝试使用工具来分析此转储,尽管其巨大的规模将带来挑战。例如,尝试在MAT中简单地打开它很可能不起作用,但是有一些方法可以在命令行上运行它的一部分——可能是在一台健壮的服务器上远程运行

    有一些文章描述了对大型堆转储的分析:

    简而言之,这些说明归结为:

    • 下载并安装MAT
    • 根据分析过程中可用的内容为MAT配置内存设置(显然,越多越好)
    • 它应该包含一个ParseHeapDump.sh脚本,可以用来运行一些分析和准备索引/报告文件。请注意,这当然需要很长时间

      ./ParseHeapDump.sh /path/to/your.hprof
      ./ParseHeapDump.sh /path/to/your.hprof org.eclipse.mat.api:suspects
      ./ParseHeapDump.sh /path/to/your.hprof org.eclipse.mat.api:overview
      ./ParseHeapDump.sh /path/to/your.hprof org.eclipse.mat.api:top_components
      

    然后,您应该能够使用MAT打开生成的报告,并希望能对它们有所帮助


    在你的评论中,你说大部分内存被一些对象的列表使用,并怀疑这些对象没有被释放

    根据您发布的代码,SomeObject对象没有被释放,因为它们仍然可以通过unchoppedList列表访问:该列表在您发布的代码中没有被清除,因此对m.clear()的调用实际上对已用内存没有影响,因为所有这些对象仍然在其他地方被引用

    因此,解决方案可能很简单,只需在填充主列表后添加一行unchoppedList.clear();

    List<List<someObject>> master = null;
    // lets also get rid of hardcoded numbers of lists
    int maxListSize = 175;
    int nbSublists = (unchoppedList.size() + maxListSize - 1) / maxListSize; // obtain rounded up integer division
    master = chopped(unchoppedList, nbSublists);
    // important: clear the unchoppedList so it doesn't keep references to *all* SomeObject
    unchoppedList.clear();
    

    对于其他关于ArrayList的非线程安全使用的评论,我必须同意其他人的看法,即这通常是个坏主意

    为了解决最明显的一个问题,我甚至不认为在向执行者提交工作时使用parallelStream有什么好的理由。使用正常的顺序流将再次确保线程安全(从而消除潜在的问题源)

    请注意,如果这种变化对性能有影响,我相信它甚至可能是积极的

    • lambda表达式非常简单,因此执行速度非常快;从理论上讲,平行流带来的最大好处似乎微乎其微
    • 每个被处理的顺序项都会启动一个新线程,直到执行器达到最大值,所以所有内核几乎都会在瞬间忙碌
    • 使用并行流本身也会产生大量开销,在这种情况下,并行流线程还必须与执行线程争夺CPU时间

    除此之外,可能还有其他并发问题在起作用;如果没有完整的程序,就很难进行评估,但是您的writeToCSVFile(reportData);调用(例如)看起来也有潜在的问题