无法从 java 中的 AVRO 文件生成 CSV
Posted
技术标签:
【中文标题】无法从 java 中的 AVRO 文件生成 CSV【英文标题】:Unable to generate CSV from AVRO file in java 【发布时间】:2021-10-27 03:48:31 【问题描述】:无法将 AVRO 生成为 CSV,无法找出错误根本原因
pom.xml-
<jackson.version>2.12.4</jackson.version>
<dependency>
<groupId>com.fasterxml.jackson.datatype</groupId>
<artifactId>jackson-datatype-hppc</artifactId>
<version>$jackson.version</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.datatype</groupId>
<artifactId>jackson-datatype-json-org</artifactId>
<version>$jackson.version</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.datatype</groupId>
<artifactId>jackson-datatype-jsr310</artifactId>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-csv</artifactId>
<version>$jackson.version</version>
<type>jar</type>
</dependency>
型号-
package com.test.employee.model;
import java.time.ZonedDateTime;
import com.fasterxml.jackson.annotation.JsonIgnore;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.annotation.JsonPropertyOrder;
@JsonPropertyOrder("id", "name", "modifiedTimestamp", "score")
public class EmployeeModel
@JsonProperty("id")
private String id;
@JsonProperty("name")
private String name;
@JsonProperty("modifiedTimestamp")
private ZonedDateTime modifiedTimestamp;
@JsonProperty("score")
private String score;
@JsonIgnore
private String employeeId;
public EmployeeModel(String id, String name, ZonedDateTime modifiedTimestamp,
@JsonProperty("score"), String score)
this.id = id;
this.name = name;
this.modifiedTimestamp = modifiedTimestamp;
this.score = score;
public String getId()
return id;
public String getName()
return name;
public ZonedDateTime getModifiedTimestamp()
return modifiedTimestamp;
public String getScore()
return score;
public String getEmployeeId()
return employeeId;
public class Test
private File avroToCsv(File avroFile, String path)
File result = new File(path);
if (result.exists()) result.delete();
try
GenericDatumReader<GenericData.Record> datum = new GenericDatumReader<>();
DataFileReader<GenericData.Record> reader = new DataFileReader<>(avroFile, datum);
GenericData.Record record = new GenericData.Record(reader.getSchema());
CsvMapper csvMapper = new CsvMapper();
CsvSchema schema = csvMapper.schemaFor(EmployeeModel.class).withHeader();
OutputStream outStream = new FileOutputStream(result , true);
CsvGenerator csvGenerator = csvMapper.getFactory().createGenerator(outStream);
ObjectWriter csvWriter = csvMapper.writer(schema);
while (reader.hasNext())
reader.next(record);
LocalDateTime dateTime = LocalDateTime.parse(modifDate);
ZonedDateTime modifiedDate = ZonedDateTime.of(dateTime, ZoneId.systemDefault());
EmployeeModel tempModel = new EmployeeModel(
record.get("id").toString(),
record.get("name").toString(),
modifiedDate,
record.get("score").toString()
);
csvWriter.writeValue(csvGenerator, tempModel);
catch (Exception e)
e.printStackTrace();
public static void main(String []args)
avroToCsv(new File("abc.avro"), "c:/test/ModifiedUsers.csv");
Schema 对象包含以下标头:“id”、“name”、“modifiedTimestamp”、“score” Employee 模型包含以下值:“123456”、“demoapp”、2021-04-16T19:00:54、“2.4”
收到异常: //
com.fasterxml.jackson.databind.JsonMappingException: [no message for java.lang.NullPointerException]
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._wrapAsIOE(DefaultSerializerProvider.java:509)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:482)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:319)
at com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1396)
at com.fasterxml.jackson.databind.ObjectWriter._configAndWriteValue(ObjectWriter.java:1120)
at com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:933)
at com.test.employee.Test.avroToCsv(AvroToCsv.java:164)
at com.test.employee.Test.main(Test.java:41)
Caused by: java.lang.NullPointerException
at com.fasterxml.jackson.core.base.GeneratorBase.setCurrentValue(GeneratorBase.java:138)
at com.fasterxml.jackson.core.base.GeneratorBase.writeStartObject(GeneratorBase.java:290)
at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:151)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:480)
... 7 more
// com.fasterxml.jackson.core.JsonGenerationException:无法启动对象,需要字段名称 在 com.fasterxml.jackson.core.JsonGenerator._reportError(JsonGenerator.java:1961) 在 com.fasterxml.jackson.dataformat.csv.CsvGenerator._verifyValueWrite(CsvGenerator.java:957) 在 com.fasterxml.jackson.dataformat.csv.CsvGenerator.writeStartObject(CsvGenerator.java:584) 在 com.fasterxml.jackson.core.base.GeneratorBase.writeStartObject(GeneratorBase.java:286) 在 com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:151) 在 com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:480) 在 com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:319) 在 com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1396) 在 com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:913)
Issue Identified with library:
https://github.com/FasterXML/jackson-dataformats-text/issues/114
【问题讨论】:
【参考方案1】:这里的问题不是架构,或者缺少架构(有点),而是您正在为每个值构造新的 ObjectWriter
实例:这不起作用。
相反,如果您想单独编写一系列行,则需要构造一个SequenceWriter
。比如:
ObjectWriter ow = mapper.writerFor(schema);
SequenceWriter sw = ow.writeValues(outStream);
然后使用
sw.write(tempModel);
【讨论】:
以上是关于无法从 java 中的 AVRO 文件生成 CSV的主要内容,如果未能解决你的问题,请参考以下文章
使用 sqoop 将数据从 CSV 导入 Avro 表的命令
猪:无法将 java.lang.String 转换为 org.apache.avro.util.Utf8 与 STORE 中的 AvroStorage