反序列化Avro序列化Kafka流的问题

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了反序列化Avro序列化Kafka流的问题相关的知识,希望对你有一定的参考价值。

我试图实现商店时收到异常。我正在运行Kafka 1.0,Confluent的Schema Registry 4.0和Avro 1.8.2。我使用Avro的maven插件生成了Pojo,并使用Confluent maven插件将模式部署到Confluent服务器。我能够为STREAM1主题生成一条消息。以下是设置流的代码:

Properties properties = new Properties();
properties.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-pipe");
properties.put(StreamsConfig.CLIENT_ID_CONFIG, "cleant-id");
properties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "http://localhost:9092");
properties.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081");
properties.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
properties.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, SpecificAvroSerde.class);
StreamsBuilder builder = new StreamsBuilder();

Serde<T> pojoSerde = new SpecificAvroSerde<>();
final Map<String, String> serdeConfig = Collections.singletonMap(
        AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081");
pojoSerde.configure(serdeConfig, false);

Consumed<String, Pojo> consumed = Consumed.with(Serdes.String(), pojoSerde);
KStream<String, Pojo> source = builder.stream(TopicName.STREAM1.toString(), consumed);
KTable<String, Long> storePojoCount = source
        .groupBy((key, value) -> key)
        .count(Materialized.as(StoreName.STORE_WORD_COUNT.toString()));

Produced<String, Long> produced = Produced.with(Serdes.String(), Serdes.Long());
storePojoCount.toStream().to(TopicName.STREAM2.toString(), produced);
KafkaStreams streams = new KafkaStreams(builder.build(), properties);
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
streams.start();

这产生了以下例外。

Exception in thread "cleant-id-StreamThread-2" org.apache.kafka.streams.errors.StreamsException: Deserialization exception handler is set to fail upon a deserialization error. If you would rather have the streaming pipeline continue after a deserialization error, please set the default.deserialization.exception.handler appropriately.
    at org.apache.kafka.streams.processor.internals.RecordDeserializer.deserialize(RecordDeserializer.java:74)
    at org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:91)
    at org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:117)
    at org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:546)
    at org.apache.kafka.streams.processor.internals.StreamThread.addRecordsToTasks(StreamThread.java:920)
    at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:821)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744)
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!

如何配置此SpecificAvroSerde以成功反序列化流?

答案

问题是Materialized对象没有适当的反序列化器 - Avro正在尝试反序列化KTable值,因为Avro是默认值反序列化器。它无法这样做,因为KTable值实际上是Longs。

使用正确的反序列化器创建物化对象将解决该问题。

protected <K, V> Materialized<K, V, KeyValueStore<Bytes, byte[]>> persistentStore(StoreName storeName, Serde<K> keyType, Serde<V> valueType) {
    KeyValueBytesStoreSupplier storeSupplier = Stores.persistentKeyValueStore(storeName.toString());
    return Materialized.<K, V>as(storeSupplier).withKeySerde(keyType).withValueSerde(valueType);
}

任何商店供应商都可以在这里使用 - 这只是符合我需求的供应商。

以上是关于反序列化Avro序列化Kafka流的问题的主要内容,如果未能解决你的问题,请参考以下文章

如何使用来自 Kafka 的 Python 解码/反序列化 Avro

使用 Apache Beam 反序列化 Kafka AVRO 消息

在火花结构化流中反序列化 kafka avro 主题的 int 编码无效

Avro 与 Protobuf 的性能指标

节Avro序列化的使用

如何通过 Debezium Connect 反序列化来自 Kafka 消息流的几何字段?