用java打印大数据文件的最佳方式

1 年，2 月 Questions & Answers 714

我正试图将大量数据输出到一个文件中。现在，我正在尝试以下方法：

byte[][] hands, with dimensions 2.5 billion x 7

我有一系列嵌套的for循环：

for ...
  for ...
    for ...
      hands[i][j] = blah

然后我在最后输出数组指针的所有条目

另一种方法是不使用内存，每次都写：对于对于对于嗯。println（废话）

但这看起来会很慢，因为它会一直打印

第一种方法是最好的吗？一些中间方法会更好吗，比如存储和打印每个k条目？如果是这样的话，k的一个好值是多少

编辑：下面是代码

package tables;

import general.Config;
import general.Constants;

import java.io.BufferedReader;
import java.io.DataOutputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.util.StringTokenizer;

// Outputs canonical river hands
public class OutputRiverCanonicalHands3 implements Config, Constants{

    public static void main(String[] args) throws IOException {
        int half_river = (int)(NUM_RIVER_HANDS/2);
        boolean[] river_seen_index_1 = new boolean[half_river];
        boolean[] river_seen_index_2 = new boolean[(int)(NUM_RIVER_HANDS - half_river)];
        System.out.println("DONE DECLARING RIVER SEEN");
        byte hole11, hole12, board1, board2, board3, board4, board5;
        long river_index;

        byte[][] turnHands = new byte[NUM_TURN_HANDS][6]; 
        System.out.println("DONE DECLARING TURN");
        BufferedReader br = new BufferedReader(new FileReader(RIVER_TURN_INDICES_FILE2));
        int count = 0;
        while (br.ready()) {
            StringTokenizer str = new StringTokenizer(br.readLine());
            str.nextToken();
            for (int i = 0; i < turnHands[count].length; ++i)
                turnHands[count][i] = Byte.parseByte(str.nextToken());
            ++count;
        }
        br.close();
        System.out.println("DONE READING TURN");

        DataOutputStream dos = new DataOutputStream(new FileOutputStream(RIVER_CANONICAL_HANDS_FILE3));
        byte[][] hands = new byte[half_river][7];
        System.out.println("DONE DECLARING RIVER ARRAY");

        long startTime = System.currentTimeMillis();
        int arrayIndex;
        for (int i = 0; i < turnHands.length; ++i) {
            if (i % 100000 == 0) {
                long elapsedTime = System.currentTimeMillis() - startTime;
                System.out.println(i + " " + elapsedTime);
            }
            hole11 = turnHands[i][0];
            hole12 = turnHands[i][1];
            board1 = turnHands[i][2];
            board2 = turnHands[i][3];
            board3 = turnHands[i][4];
            board4 = turnHands[i][5];
            for (board5 = 0; board5 < DECK_SIZE; ++board5) {
                if (board5 == hole11 || board5 == hole12 
                        || board5 == board1 || board5 == board2 || board5 == board3 || board5 == board4)
                    continue;

                river_index = ComputeIndicesTight.compute_river_index(hole11, hole12, board1, board2, board3, board4, board5);
                if (river_index < half_river && river_seen_index_1[(int)river_index]) 
                    continue;
                if (river_index >= half_river && river_seen_index_2[(int)(river_index - half_river)])
                    continue;
                if (river_index < half_river) {
                    arrayIndex = (int)river_index;
                    river_seen_index_1[arrayIndex] = true;
                    hands[arrayIndex][0] = hole11;
                    hands[arrayIndex][1] = hole12;
                    hands[arrayIndex][2] = board1;
                    hands[arrayIndex][3] = board2;
                    hands[arrayIndex][4] = board3;
                    hands[arrayIndex][5] = board4;
                    hands[arrayIndex][6] = board5;
                }
                else if (river_index == half_river) {
                    System.out.println("HALFWAY THERE");
                    for (int j = 0; j < hands.length; ++j) 
                        for (int k = 0; k < 7; ++k)
                            dos.writeByte(hands[j][k]);
                    hands = new byte[(int)(NUM_RIVER_HANDS - half_river)][7];
                    System.out.println("DONE PRINTING HALFWAY!");
                }
                if (river_index >= half_river) {
                    arrayIndex = (int)(river_index - half_river);
                    river_seen_index_2[arrayIndex] = true;
                    hands[arrayIndex][0] = hole11;
                    hands[arrayIndex][1] = hole12;
                    hands[arrayIndex][2] = board1;
                    hands[arrayIndex][3] = board2;
                    hands[arrayIndex][4] = board3;
                    hands[arrayIndex][5] = board4;
                    hands[arrayIndex][6] = board5;
                }
            }
        }
        for (int j = 0; j < hands.length; ++j) 
            for (int k = 0; k < 7; ++k)
                dos.writeByte(hands[j][k]);

        dos.close();
    }
}

# 1 楼答案

（正如我所怀疑的……）

代码的输出性能问题有一个非常简单的解释。这一行：

DataOutputStream dos = new DataOutputStream(
       new FileOutputStream(RIVER_CANONICAL_HANDS_FILE3));

正在创建一个直接写入文件而无需任何缓冲的流。每次执行write时，它都会执行write系统调用。那太贵了。通过简单地向输出管道添加BufferedOutputStream，您应该可以获得更好的性能：

DataOutputStream dos = new DataOutputStream(
       new BufferedOutputStream(
               new FileOutputStream(RIVER_CANONICAL_HANDS_FILE3)));

I figured writing the data in binary would save some space, since the file will be so large.

不会的。空间使用情况将与将byte值写入FileOutputStream时完全相同

事实上，如果这是使用DataOutputStream的唯一原因，那么最好不要使用它，而是这样写手工数据：

    dos.write(hands[j]);

。。。使用OutputStream.write(byte[])方法，并摆脱最内部的写循环。（但使用BufferedOutputStream将产生更大的影响！）

共 (1) 个答案

# 1 楼答案
（正如我所怀疑的……）

代码的输出性能问题有一个非常简单的解释。这一行：
```
DataOutputStream dos = new DataOutputStream(
       new FileOutputStream(RIVER_CANONICAL_HANDS_FILE3));
```
正在创建一个直接写入文件而无需任何缓冲的流。每次执行write时，它都会执行write系统调用。那太贵了。通过简单地向输出管道添加BufferedOutputStream，您应该可以获得更好的性能：
```
DataOutputStream dos = new DataOutputStream(
       new BufferedOutputStream(
               new FileOutputStream(RIVER_CANONICAL_HANDS_FILE3)));
```
I figured writing the data in binary would save some space, since the file will be so large.

不会的。空间使用情况将与将byte值写入FileOutputStream时完全相同

事实上，如果这是使用DataOutputStream的唯一原因，那么最好不要使用它，而是这样写手工数据：
```
    dos.write(hands[j]);
```
。。。使用OutputStream.write(byte[])方法，并摆脱最内部的写循环。（但使用BufferedOutputStream将产生更大的影响！）

Python中文网

有 Java 编程相关的问题?

用java打印大数据文件的最佳方式

共 (1) 个答案

# 1 楼答案