有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java将JSONArray拆分为更小的JSONArray

我遇到了一种情况,即组织。json。JSONArray对象的大小非常大,这最终会导致延迟和其他问题。因此,我们决定将JSONArray拆分为更小的块。 例如,如果JSONArray是这样的:

- [{"alt_party_id_type":"xyz","first_name":"child1ss","status":"1","dob":"2014-10-02 00:00:00.0","last_name":"childSs"},
{"alt_party_id_type":"xyz","first_name":"suga","status":"1","dob":"2014-11-05 00:00:00.0","last_name":"test"},
{"alt_party_id_type":"xyz","first_name":"test4a","status":"1","dob":"2000-11-05 00:00:00.0","last_name":"test4s"},
{"alt_party_id_type":"xyz","first_name":"demo56","status":"0","dob":"2000-11-04 00:00:00.0","last_name":"Demo5"},{"alt_party_id_type":"xyz","first_name":"testsss","status":"1","dob":"1900-01-01 00:00:00.0","last_name":"testssssssssss"},{"alt_party_id_type":"xyz","first_name":"Demo1234","status":"0","dob":"2014-11-21 00:00:00.0","last_name":"Demo1"},{"alt_party_id_type":"xyz","first_name":"demo2433","status":"1","dob":"2014-11-13 00:00:00.0","last_name":"demo222"},{"alt_party_id_type":"xyz","first_name":"demo333","status":"0","dob":"2014-11-12 00:00:00.0","last_name":"demo344"},{"alt_party_id_type":"xyz","first_name":"Student","status":"1","dob":"2001-12-03 00:00:00.0","last_name":"StudentTest"}]

然后我需要帮助将JSONArray划分为三个JSONArray:

- [{"alt_party_id_type":"xyz","first_name":"child1ss","status":"1","dob":"2014-10-02 00:00:00.0","last_name":"childSs"}, {"alt_party_id_type":"xyz","first_name":"suga","status":"1","dob":"2014-11-05 00:00:00.0","last_name":"test"}, {"alt_party_id_type":"xyz","first_name":"test4a","status":"1","dob":"2000-11-05 00:00:00.0","last_name":"test4s"}]


 - [{"alt_party_id_type":"xyz","first_name":"demo56","status":"0","dob":"2000-11-04 00:00:00.0","last_name":"Demo5"}, {"alt_party_id_type":"xyz","first_name":"testsss","status":"1","dob":"1900-01-01 00:00:00.0","last_name":"testssssssssss"}, {"alt_party_id_type":"xyz","first_name":"Demo1234","status":"0","dob":"2014-11-21 00:00:00.0","last_name":"Demo1"}] 


 - [{"alt_party_id_type":"xyz","first_name":"demo2433","status":"1","dob":"2014-11-13 00:00:00.0","last_name":"demo222"}, {"alt_party_id_type":"xyz","first_name":"demo333","status":"0","dob":"2014-11-12 00:00:00.0","last_name":"demo344"}, {"alt_party_id_type":"xyz","first_name":"Student","status":"1","dob":"2001-12-03 00:00:00.0","last_name":"StudentTest"}]

有人能帮我吗。我尝试了许多选择,但都失败了


共 (1) 个答案

  1. # 1 楼答案

    在处理巨大的输入文件时,您应该使用流式处理方法,而不是将整个文档加载到内存中,以减少内存占用,避免OutOfMemoryError,并使读取输入时开始处理成为可能。JSONArray对流媒体几乎没有支持,所以我建议您研究一下Jackson's streaming APIGSON streaming或类似的内容

    也就是说,如果你坚持使用JSONArray,你可以通过使用JSONTokener拼凑出一个流方法。下面是一个示例程序,它将对输入文件进行流式处理,并创建单独的JSON文档,每个文档最多包含10个元素

    import java.io.*;
    import java.util.*;
    import org.json.*;
    
    public class JsonSplit {
    
        private static final int BATCH_SIZE = 10;
    
        public static void flushFile(List<Object> objects, int d) throws Exception {
            try (FileOutputStream output = new FileOutputStream("split-" + d
                + ".json");
                    Writer writer = new OutputStreamWriter(output, "UTF-8")) {
                JSONArray jsonArray = new JSONArray(objects);
                jsonArray.write(writer);
            }
        }
    
        public static void main(String[] args) throws Exception {
            int outputIndex = 0;
            try (InputStream input = new BufferedInputStream(
                    new FileInputStream(args[0]))) {
                JSONTokener tokener = new JSONTokener(input);
                if (tokener.nextClean() != '[') {
                    throw tokener.syntaxError("Expected start of JSON array");
                }
                List<Object> jsonObjects = new ArrayList<>();
                while (tokener.nextClean() != ']') {
                    // Back up one character, it's part of the next value.
                    tokener.back();
                    // Read the next value in the array.
                    jsonObjects.add(tokener.nextValue());
                    // Flush if max objects per file has been reached.
                    if (jsonObjects.size() == BATCH_SIZE) {
                        flushFile(jsonObjects, outputIndex);
                        jsonObjects.clear();
                        outputIndex++;
                    }
                    // Read and discard commas between array elements.
                    if (tokener.nextClean() != ',') {
                        tokener.back();
                    }
                }
                if (!jsonObjects.isEmpty()) {
                    flushFile(jsonObjects, outputIndex);
                }
                // Verify that end of input is reached.
                if (tokener.nextClean() != 0) {
                    throw tokener.syntaxError("Expected end of document");
                }
            }
    
        }
    
    }
    

    要了解为什么大型文件需要流式处理方法,请下载或创建一个大型JSON文件,然后尝试运行一个不流式处理的简单实现。下面是一个Perl命令,用于创建一个包含1000000个元素、文件大小约为16MB的JSON数组

    perl -le 'print "["; for (1..1_000_000) {print "," unless $_ == 1; print "{\"id\": " . int(rand(1_000_000)) . "}";} print "]"' > input_huge.json
    

    如果你在这个输入上运行JsonSplit,它将以很小的内存占用率快速运行,生成100000个文件,每个文件中包含10个元素。此外,它将在启动时立即开始生成输出文件

    相反,如果您运行以下JsonSplitNaive程序,该程序一次读取整个JSON文档,它显然会在很长一段时间内什么都不做,然后使用OutOfMemoryError终止

    import java.io.*;
    import java.util.*;
    import org.json.*;
    
    public class JsonSplitNaive {
    
        /*
         * Naive version - do not use, will fail with OutOfMemoryError for
         * huge inputs.
         */
    
        private static final int BATCH_SIZE = 10;
    
        public static void flushFile(List<Object> objects, int d) throws Exception {
            try (FileOutputStream output = new FileOutputStream("split-" + d
                + ".json");
                    Writer writer = new OutputStreamWriter(output, "UTF-8")) {
                JSONArray jsonArray = new JSONArray(objects);
                jsonArray.write(writer);
            }
        }
    
        public static void main(String[] args) throws Exception {
            int outputIndex = 0;
            try (InputStream input = new BufferedInputStream(
                    new FileInputStream(args[0]))) {
                List<Object> jsonObjects = new ArrayList<>();
                JSONArray jsonArray = new JSONArray(new JSONTokener(input));
                for (int i = 0; i < jsonArray.length(); i++) {
                    jsonObjects.add(jsonArray.get(i));
                    // Flush if max objects per file has been reached.
                    if (jsonObjects.size() == BATCH_SIZE) {
                        flushFile(jsonObjects, outputIndex);
                        jsonObjects.clear();
                        outputIndex++;
                    }
                }
                if (!jsonObjects.isEmpty()) {
                    flushFile(jsonObjects, outputIndex);
                }
            }
        }
    
    }