Kafka传输文件(字节数组)
Posted Firm陈
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Kafka传输文件(字节数组)相关的知识,希望对你有一定的参考价值。
使用Kafka以字节数组的形式传输文件
最近遇到解析大量小文件的需求,之前都是将文件放到HDFS,然后读取进行解析。
由于都是小文件且文件量很多,所以不想使用HDFS,于是采用Kafka来做中间件,效果还不错,特此分享。
原理是将文件以字节流的形式读入字节数组中,将字节数组发送到Kafka,供下游消费。
适用于海量小文件的处理。
实现
生产者:
package com.upupfeng.kafka;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import java.io.File;
import java.io.FileInputStream;
import java.util.Properties;
/**
* 将文件内容序列化,发到kafka中
*
* @author mawf
*/
public class SendFileToKafka
public static void main(String[] args)
String filePath = "D:\\\\dev\\\\a.xml.gz";
Properties kafkaProps = new Properties();
kafkaProps.put("bootstrap.servers", "server1:9092");
kafkaProps.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
kafkaProps.put("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");
KafkaProducer<String, byte[]> producer = new KafkaProducer<String, byte[]>(kafkaProps);
try
File file = new File(filePath);
FileInputStream fis = new FileInputStream(file);
byte[] buffer = new byte[fis.available()];
// 读到buffer字节数组中
fis.read(buffer);
ProducerRecord<String, byte[]> record = new ProducerRecord<String, byte[]>("dataTopic", file.getName(), buffer);
producer.send(record);
producer.close();
catch (Exception e)
e.printStackTrace();
消费者
package com.upupfeng.kafka;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.InputStreamReader;
import java.util.Arrays;
import java.util.Properties;
import java.util.zip.GZIPInputStream;
/**
* @author mawf
*/
public class ConsumerFileByteArrayFromKafka
public static void main(String[] args)
Properties props = new Properties();
props.put("bootstrap.servers", "server1:9092");
props.put("group.id", "group1");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.ByteArrayDeserializer");
KafkaConsumer<String, byte[]> consumer = new KafkaConsumer<String, byte[]>(props);
consumer.subscribe(Arrays.asList("dataTopic"));
try
while (true)
ConsumerRecords<String, byte[]> records = consumer.poll(100);
for (ConsumerRecord<String, byte[]> record : records)
System.out.println("offset=" + record.offset() + ",key=" + record.key() + ",value=" + record.value());
String fileName = record.key();
byte[] message = record.value();
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(message);
GZIPInputStream gzipInputStream = new GZIPInputStream(byteArrayInputStream);
BufferedReader br = new BufferedReader(new InputStreamReader(gzipInputStream));
String line;
while ((line = br.readLine()) != null)
System.out.println(line);
br.close();
byteArrayInputStream.close();
catch (Exception e)
e.printStackTrace();
finally
consumer.close();
以上是关于Kafka传输文件(字节数组)的主要内容,如果未能解决你的问题,请参考以下文章