无法从 java 中的 AVRO 文件生成 CSV

Posted

技术标签:

【中文标题】无法从 java 中的 AVRO 文件生成 CSV【英文标题】:Unable to generate CSV from AVRO file in java 【发布时间】:2021-10-27 03:48:31 【问题描述】:

无法将 AVRO 生成为 CSV,无法找出错误根本原因

pom.xml-

<jackson.version>2.12.4</jackson.version>
<dependency>
    <groupId>com.fasterxml.jackson.datatype</groupId>
    <artifactId>jackson-datatype-hppc</artifactId>
    <version>$jackson.version</version>
</dependency>
<dependency>
    <groupId>com.fasterxml.jackson.datatype</groupId>
    <artifactId>jackson-datatype-json-org</artifactId>
    <version>$jackson.version</version>
</dependency>
<dependency>
    <groupId>com.fasterxml.jackson.datatype</groupId>
    <artifactId>jackson-datatype-jsr310</artifactId>
</dependency>
<dependency>
    <groupId>com.fasterxml.jackson.dataformat</groupId>
    <artifactId>jackson-dataformat-csv</artifactId>
    <version>$jackson.version</version>
    <type>jar</type>
</dependency>

型号-

package com.test.employee.model;
import java.time.ZonedDateTime;
import com.fasterxml.jackson.annotation.JsonIgnore;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.annotation.JsonPropertyOrder;

@JsonPropertyOrder("id", "name", "modifiedTimestamp", "score")

public class EmployeeModel 
    @JsonProperty("id")
    private String id;
    
    @JsonProperty("name")
    private String name;
    
    @JsonProperty("modifiedTimestamp")
    private ZonedDateTime modifiedTimestamp;
    
    @JsonProperty("score")
    private String score;
    
    @JsonIgnore
    private String employeeId;
 
    public EmployeeModel(String id, String name, ZonedDateTime modifiedTimestamp,
    @JsonProperty("score"), String score) 
        this.id = id;
        this.name = name;
        this.modifiedTimestamp = modifiedTimestamp;
        this.score = score;
     

    public String getId() 
        return id;
    
    public String getName() 
        return name;
    
    public ZonedDateTime getModifiedTimestamp() 
        return modifiedTimestamp;
    
    public String getScore() 
        return score;
    
    
    public String getEmployeeId() 
        return employeeId;
    


public class Test 
        private File avroToCsv(File avroFile, String path) 
                File result = new File(path);

                if (result.exists()) result.delete();

                try 
                    GenericDatumReader<GenericData.Record> datum = new GenericDatumReader<>();
                    DataFileReader<GenericData.Record> reader = new DataFileReader<>(avroFile, datum);
                    GenericData.Record record = new GenericData.Record(reader.getSchema()); 
                    CsvMapper csvMapper = new CsvMapper();
                    CsvSchema schema = csvMapper.schemaFor(EmployeeModel.class).withHeader();
                    OutputStream outStream = new FileOutputStream(result , true);
                    CsvGenerator csvGenerator = csvMapper.getFactory().createGenerator(outStream);
                    ObjectWriter csvWriter = csvMapper.writer(schema);

                    while (reader.hasNext()) 
                        reader.next(record);
                        LocalDateTime dateTime = LocalDateTime.parse(modifDate);
                        ZonedDateTime modifiedDate = ZonedDateTime.of(dateTime, ZoneId.systemDefault());
                        EmployeeModel tempModel = new EmployeeModel(
                                record.get("id").toString(),
                                record.get("name").toString(),
                                modifiedDate,
                                record.get("score").toString()
                        );
                    csvWriter.writeValue(csvGenerator, tempModel);
                    
                catch (Exception e) 
                    e.printStackTrace();
                
        
    
        public static void main(String []args) 
            avroToCsv(new File("abc.avro"), "c:/test/ModifiedUsers.csv");
           


Schema 对象包含以下标头:“id”、“name”、“modifiedTimestamp”、“score” Employee 模型包含以下值:“123456”、“demoapp”、2021-04-16T19:00:54、“2.4”


收到异常: //

com.fasterxml.jackson.databind.JsonMappingException: [no message for java.lang.NullPointerException]
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._wrapAsIOE(DefaultSerializerProvider.java:509)
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:482)
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:319)
    at com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1396)
    at com.fasterxml.jackson.databind.ObjectWriter._configAndWriteValue(ObjectWriter.java:1120)
    at com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:933)
    at com.test.employee.Test.avroToCsv(AvroToCsv.java:164)
    at com.test.employee.Test.main(Test.java:41)

Caused by: java.lang.NullPointerException
    at com.fasterxml.jackson.core.base.GeneratorBase.setCurrentValue(GeneratorBase.java:138)
    at com.fasterxml.jackson.core.base.GeneratorBase.writeStartObject(GeneratorBase.java:290)
    at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:151)
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:480)
    ... 7 more

// com.fasterxml.jackson.core.JsonGenerationException:无法启动对象,需要字段名称 在 com.fasterxml.jackson.core.JsonGenerator._reportError(JsonGenerator.java:1961) 在 com.fasterxml.jackson.dataformat.csv.CsvGenerator._verifyValueWrite(CsvGenerator.java:957) 在 com.fasterxml.jackson.dataformat.csv.CsvGenerator.writeStartObject(CsvGenerator.java:584) 在 com.fasterxml.jackson.core.base.GeneratorBase.writeStartObject(GeneratorBase.java:286) 在 com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:151) 在 com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:480) 在 com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:319) 在 com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1396) 在 com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:913)


Issue Identified with library:

https://github.com/FasterXML/jackson-dataformats-text/issues/114

【问题讨论】:

【参考方案1】:

这里的问题不是架构,或者缺少架构(有点),而是您正在为每个值构造新的 ObjectWriter 实例:这不起作用。

相反,如果您想单独编写一系列行,则需要构造一个SequenceWriter。比如:

ObjectWriter ow = mapper.writerFor(schema);
SequenceWriter sw = ow.writeValues(outStream);

然后使用

sw.write(tempModel);

【讨论】:

以上是关于无法从 java 中的 AVRO 文件生成 CSV的主要内容,如果未能解决你的问题,请参考以下文章

使用 sqoop 将数据从 CSV 导入 Avro 表的命令

猪:无法将 java.lang.String 转换为 org.apache.avro.util.Utf8 与 STORE 中的 AvroStorage

从 BigQuery 中的 .avro 文件创建表时出现“resourcesExceeded”错误

如何从 Java 中的 avro 文件中提取模式

无法从对象生成avro通用记录

从 avro 文件将数据集转换为数据框