如何反序列化大 JSON 文件 (~300Mb)

Posted

技术标签:

【中文标题】如何反序列化大 JSON 文件 (~300Mb)【英文标题】:How to deserialise big JSON file (~300Mb) 【发布时间】:2020-03-02 17:52:37 【问题描述】:

我想解析一个JSON 文件(大小~300Mb)。我使用Jackson 库和ObjectMapper。出现记忆问题正常吗?

第一次,我使用BufferedReader,它崩溃了应用程序。接下来,我使用这个库。解析保存到SQLite数据库需要多少时间,很长?

【问题讨论】:

您需要使用Streaming API 部分加载您的JSON,只需要添加一行。看看:Parsing JSON array to java object,Json processing with Jackson: Method #1/3: Reading and Writing Event Streams,Fastest way to parse JSON from String when format is known 不需要 JSON DOM,因此@MichałZiober 是对的。如果数据非常规律,并且应用程序不需要达到生产质量,那么手动解析可能就足够了。 【参考方案1】:

杰克逊

您可以将Streaming API 与常规ObjectMapper 混合使用。使用这些我们可以实现漂亮的Iterator 类。使用URL,我们可以构建流并传递给我们的实现。示例代码如下所示:

import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.core.JsonToken;
import com.fasterxml.jackson.databind.DeserializationFeature;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.math.BigDecimal;
import java.net.URL;
import java.util.Iterator;

public class JsonPathApp 

    public static void main(String[] args) throws Exception 
        //Just to make it work. Probably you should not do that!
        SSLUtilities.trustAllHostnames();
        SSLUtilities.trustAllHttpsCertificates();

        URL url = new URL("https://data.opendatasoft.com/explore/dataset/vehicules-commercialises@public/download/?format=json&timezone=Europe/Berlin");
        try (BufferedReader reader = new BufferedReader(new InputStreamReader(url.openConnection().getInputStream()))) 
            FieldsJsonIterator fieldsJsonIterator = new FieldsJsonIterator(reader);
            while (fieldsJsonIterator.hasNext()) 
                Fields fields = fieldsJsonIterator.next();
                System.out.println(fields);
                // Save object to DB
            
        
    


class FieldsJsonIterator implements Iterator<Fields> 

    private final ObjectMapper mapper;
    private final JsonParser parser;

    public FieldsJsonIterator(Reader reader) throws IOException 
        mapper = new ObjectMapper();
        mapper.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES);

        parser = mapper.getFactory().createParser(reader);
        skipStart();
    

    private void skipStart() throws IOException 
        while (parser.currentToken() != JsonToken.START_OBJECT) 
            parser.nextToken();
        
    

    @Override
    public boolean hasNext() 
        try 
            while (parser.currentToken() == null) 
                parser.nextToken();
            
         catch (IOException e) 
            throw new IllegalStateException(e);
        

        return parser.currentToken() == JsonToken.START_OBJECT;
    

    @Override
    public Fields next() 
        try 
            return mapper.readValue(parser, FieldsWrapper.class).fields;
         catch (IOException e) 
            throw new IllegalStateException(e);
        
    

    private static final class FieldsWrapper 
        public Fields fields;
    


class Fields 

    private String cnit;

    @JsonProperty("puissance_maximale")
    private BigDecimal maximumPower;

    @JsonProperty("champ_v9")
    private String fieldV9;

    @JsonProperty("boite_de_vitesse")
    private String gearbox;

    // add other required properties

    // getters, setters, toString

上面的代码打印:

Fieldscnit='MMB76K3BQJ41', maximumPower=110.0, fieldV9='70/220*2006/96EURO4', gearbox='A 5'
Fieldscnit='M10MCDVPF15Z219', maximumPower=95.0, fieldV9='"715/2007*566/2011EURO5', gearbox='A 7'
Fieldscnit='M10MCDVP027V654', maximumPower=150.0, fieldV9='715/2007*692/2008EURO5', gearbox='A 7'
Fieldscnit='M10MCDVPG137264', maximumPower=120.0, fieldV9='715/2007*692/2008EURO5', gearbox='M 6'
Fieldscnit='MVV4912QN718', maximumPower=210.0, fieldV9='null', gearbox='A 6'
Fieldscnit='MMB76K3B2K88', maximumPower=110.0, fieldV9='null', gearbox='A 5'
Fieldscnit='M10MCDVP012N140', maximumPower=80.0, fieldV9='70/220*2006/96EURO4', gearbox='M 6'
Fieldscnit='MJN5423PU123', maximumPower=88.0, fieldV9='null', gearbox='M 6'
Fieldscnit='M10MCDVP376T303', maximumPower=120.0, fieldV9='"715/2007*692/2008EURO5', gearbox='M 6'
Fieldscnit='MMB53H3B5Z93', maximumPower=80.0, fieldV9='70/220*2006/96EURO4', gearbox='M 6'
Fieldscnit='MPE1403E4834', maximumPower=81.0, fieldV9='null', gearbox='M 5'
Fieldscnit='M10MCDVP018J905', maximumPower=110.0, fieldV9='70/220*2006/96EURO4', gearbox='M 6'
Fieldscnit='M10MCDVPG112904', maximumPower=100.0, fieldV9='"715/2007*692/2008EURO5', gearbox='M 6'
Fieldscnit='M10MCDVP015R723', maximumPower=110.0, fieldV9='70/220*2006/96EURO4', gearbox='A 5'
...

格森

我们可以使用Gson 来做同样的事情。示例实现如下所示:

class FieldsJsonIterator implements Iterator<Fields> 

    private final Gson mapper;
    private final JsonReader parser;

    public FieldsJsonIterator(Reader reader) throws IOException 
        mapper = new GsonBuilder().create();

        parser = mapper.newJsonReader(reader);
        skipStart();
    

    private void skipStart() throws IOException 
        parser.beginArray();
    

    @Override
    public boolean hasNext() 
        try 
            return parser.hasNext();
         catch (IOException e) 
            throw new IllegalStateException(e);
        
    

    @Override
    public Fields next() 
        return ((FieldsWrapper) mapper.fromJson(parser, FieldsWrapper.class)).fields;
    

    private static final class FieldsWrapper 
        public Fields fields;
    


class Fields 

    private String cnit;

    @SerializedName("puissance_maximale")
    private BigDecimal maximumPower;

    @SerializedName("champ_v9")
    private String fieldV9;

    @SerializedName("boite_de_vitesse")
    private String gearbox;

    // getters, setters, toString

用法和输出应该与Jackson 相同。

另见:

Best way to access nested JSON objects with Java Whats an easy way to totally ignore ssl with java url connections?

【讨论】:

感谢您的代码,它运行良好且快速,我使用 Jackson 库。我在你的代码中看到了这个类,我很感兴趣,你在哪里找到这个 android 代码库://只是为了让它工作。可能你不应该那样做! SSLUtilities.trustAllHostnames(); SSLUtilities.trustAllHttpsCertificates();此外,我想知道解析器 JSON 不按 json 对象文件的顺序解析是否正常(这里是 "designation_commerciale"="LaFerrari" year="2014" 它是第一个元素)?感谢您的帮助。 @user1400390,看看添加的链接:“有什么简单的方法可以完全忽略带有 java url 连接的 ssl?”。你可能不应该那样做。我添加它是因为我的测试需要它。解析器应该按顺序解析所有数据。

以上是关于如何反序列化大 JSON 文件 (~300Mb)的主要内容,如果未能解决你的问题,请参考以下文章

[C#]如何使用Newton.Json从流中反序列化json数据

json大数据量序列化与反序列化慢

如何快速反序列化 JSON

迄今为止 .Net 平台功能最强大,性能最佳的 JSON 序列化和反序列化库。

Xamarin Android反序列化本地json文件

如何将 JSON 反序列化为正确类型的对象,而无需事先定义类型?