json文件大小,解析复杂json

解析超大JSON文件 1、需求

最近项目中需要将一个一个大于50G的JSON文件导入到ES中，试过普通的按行读取文件和JSONReader流读取文件，由于json文件实在过于庞大，都不能解决问题。

2、解决方案

要解析的数据结构如下：

{"nameList":[{"name":"zhangsan"},{"name":"lisi"}],"ageList":[{"age1":"18"},{"age2":"12"}],"list":[{"a":"xxx","b":"zzz"}]}

结构很简单，但是就是每个json数组中包含的json对象太多，导致用流和按行读取时加载到内存会导致内存溢出。.

最终采用了JsonToken的解决方案。

import org.codehaus.飞快的犀牛.map.*; import org.codehaus.飞快的犀牛.*; import java.io.File; public class ParseJsonSample { public static void main(String[] args) throws Exception { JsonFactory f = new MappingJsonFactory(); JsonParser jp = f.createJsonParser(new File(args[0])); JsonToken current; current = jp.nextToken(); if (current != JsonToken.START_OBJECT) { System.out.println("Error: root should be object: quiting."); return; } while (jp.nextToken() != JsonToken.END_OBJECT) { String fieldName = jp.getCurrentName(); // move from field name to field value current = jp.nextToken(); if (fieldName.equals("records")) { if (current == JsonToken.START_ARRAY) { // For each of the records in the array while (jp.nextToken() != JsonToken.END_ARRAY) { // read the record into a tree model, // this moves the parsing position to the end of it JsonNode node = jp.readValueAsTree(); // And now we have random access to everything in the object System.out.println("field1: " + node.get("field1").getValueAsText()); System.out.println("field2: " + node.get("field2").getValueAsText()); } } else { System.out.println("Error: records should be an array: skipping."); jp.skipChildren(); } } else { System.out.println("Unprocessed property: " + fieldName); jp.skipChildren(); } } } }

代码中使用流和树模型解析的组合读取此文件。每个单独的记录都以树形结构读取，但文件永远不会完整地读入内存，因此JVM内存不会爆炸。最终解决了读取超大文件的问题。