Kafka:ZK+Kafka+Spark Streaming集群环境搭建定制一个arvo格式文件发送到kafka的topic,通过sparkstreaming读取kafka的数据
Posted yy
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Kafka:ZK+Kafka+Spark Streaming集群环境搭建定制一个arvo格式文件发送到kafka的topic,通过sparkstreaming读取kafka的数据相关的知识,希望对你有一定的参考价值。
定制avro schema:
{ "type": "record", "name": "userlog", "fields": [ {"name": "ip","type": "string"}, {"name": "identity","type":"string"}, {"name": "userid","type":"int"}, {"name": "time","type": "string"}, {"name": "requestinfo","type": "string"}, {"name": "state","type": "int"}, {"name": "responce","type": "string"}, {"name": "referer","type": "string"}, {"name": "useragent","type": "string"} ] }
创建producer发送对象:
private static Producer<String, String> createProducer() { Properties props = new Properties(); props.put("acks", "all"); props.put("retries", 0); props.put("batch.size", 16384); props.put("linger.ms", 1); props.put("buffer.memory", 33554432); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); // 声明kafka broker props.put("bootstrap.servers", "192.168.0.121:9092,192.168.0.122:9092,192.168.0.123:9092"); Producer<String, String> procuder = new KafkaProducer<String, String>(props); return procuder; }
读取schema文件为Schema对象:
解析schema文件
private static Schema getSchema(final Configuration hadoopConf, final String avroFilePath) { Schema schema = null; try { Path pt = new Path(avroFilePath); FileSystem fs = FileSystem.get(hadoopConf); if (fs.exists(pt)) { FSDataInputStream inputStream = fs.open(pt); Schema.Parser parser = new Schema.Parser(); schema = parser.parse(inputStream); } } catch (IOException e) { e.printStackTrace(); } return schema; }
使用Schema对象生成record存储器,并对存储进行序列化:
protected static byte[] serializeEvent(GenericRecord record) throws Exception { ByteArrayOutputStream bos = null; try { bos = new ByteArrayOutputStream(); BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(bos, null); GenericDatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(record.getSchema()); writer.write(record, encoder); encoder.flush(); byte[] serializedValue = bos.toByteArray(); return serializedValue; } catch (Exception ex) { throw ex; } finally { if (bos != null) { try { bos.close(); } catch (Exception e) { bos = null; } } } }
通过producer发送数据到topic:
package com.dx.streaming.producer; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.util.ArrayList; import java.util.List; import java.util.Properties; import java.util.Random; import java.util.UUID; import org.apache.avro.Schema; import org.apache.avro.generic.GenericData; import org.apache.avro.generic.GenericDatumWriter; import org.apache.avro.generic.GenericRecord; import org.apache.avro.io.BinaryEncoder; import org.apache.avro.io.EncoderFactory; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.spark.SparkConf; import org.apache.spark.sql.SparkSession; import org.apache.kafka.clients.producer.Producer; import org.apache.kafka.clients.producer.ProducerRecord; import org.apache.kafka.common.PartitionInfo; public class TestProducer { private static final String avroFilePath = "D:\\Java_Study\\workspace\\kafka-streaming-learn\\conf\\avro\\userlog.avsc"; // "/user/dx/conf/avro/userlog.avsc"; private static final String topic = "t-my"; public static void main(String[] args) throws InterruptedException { int size = 0; String appName = "Test Avro"; SparkConf conf = new SparkConf().setMaster("local[2]").setAppName(appName); SparkSession sparkSession = SparkSession.builder().config(conf).getOrCreate(); Configuration hadoopConf = sparkSession.sparkContext().hadoopConfiguration(); Producer<String, String> procuder = createProducer(); while (true) { Random random = new Random(); String ip = random.nextInt(255) + ":" + random.nextInt(255) + ":" + random.nextInt(255) + ":" + random.nextInt(255); String identity = UUID.randomUUID().toString(); int userid = random.nextInt(100); String time = "2018-07-03 " + random.nextInt(24) + ":" + random.nextInt(60) + ":" + random.nextInt(60); String requestInfo = "...."; int state = random.nextInt(600); String responce = "..."; String referer = "..."; String useragent = "..."; Schema schema = getSchema(hadoopConf, avroFilePath); GenericRecord record = new GenericData.Record(schema); record.put("ip", ip); record.put("identity", identity); record.put("userid", userid); record.put("time", time); record.put("requestinfo", requestInfo); record.put("state", state); record.put("responce", responce); record.put("referer", referer); record.put("useragent", useragent); System.out.println(ip + "\r\n" + identity + "\r\n" + userid + "\r\n" + time); try { byte[] serializedValue = serializeEvent(record); ProducerRecord<String, String> msg = new ProducerRecord<String, String>(topic, serializedValue.toString()); procuder.send(msg); } catch (Exception e) { e.printStackTrace(); } size++; if (size % 100 == 0) { size = 0; Thread.sleep(10000); if (size > 10000) { break; } } } // 列出topic的相关信息 List<PartitionInfo> partitions = new ArrayList<PartitionInfo>(); partitions = procuder.partitionsFor(topic); for (PartitionInfo p : partitions) { System.out.println(p); } System.out.println("send message over."); procuder.close(100, java.util.concurrent.TimeUnit.MILLISECONDS); } .... }
打印结果:
192:49:185:13 1b87f3ee-cdad-46c6-91e5-64e4f2711faa 59 2018-07-03 11:41:28 25:128:123:27 115235b7-771f-42b0-94e8-2d8fba60d1d3 21 2018-07-03 7:56:53
以上是关于Kafka:ZK+Kafka+Spark Streaming集群环境搭建定制一个arvo格式文件发送到kafka的topic,通过sparkstreaming读取kafka的数据的主要内容,如果未能解决你的问题,请参考以下文章
Kafka:ZK+Kafka+Spark Streaming集群环境搭建安装zookeeper-3.4.12
Kafka:ZK+Kafka+Spark Streaming集群环境搭建(十七)待整理
Kafka:ZK+Kafka+Spark Streaming集群环境搭建(十九)待整理
Kafka:ZK+Kafka+Spark Streaming集群环境搭建(二十一)NIFI1.7.1安装
Kafka:ZK+Kafka+Spark Streaming集群环境搭建(十三)kafka+spark streaming打包好的程序提交时提示虚拟内存不足(Container is running
Kafka:ZK+Kafka+Spark Streaming集群环境搭建(二十二)Spark Streaming接收流数据及使用窗口函数