Flink+kafka实时统计本地环境搭建与代码实战
Posted bitcarmanlee
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Flink+kafka实时统计本地环境搭建与代码实战相关的知识,希望对你有一定的参考价值。
1.搭建zookeeper与kafka本地环境
flink经常用来消费上游kafka的数据,而kafka又依赖zookeeper进行。所以在进行测试之前,先要准备好本地的zookeeper与kafka环境。
关于准备zk与kafka环境,具体可以参考SparkStreaming kafka zookeeper本地环境调试安装
2.添加所需要的依赖
在pom.xml文件中,先添加所需要的依赖。主要包括flink相关的依赖已经kafka相关的依赖,flink版本1.7.2,kafka相关版本0.9。
<properties>
<flink.version>1.7.2</flink.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>$flink.version</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.11</artifactId>
<version>$flink.version</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.11</artifactId>
<version>$flink.version</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.9_2.11</artifactId>
<version>$flink.version</version>
</dependency>
</dependencies>
主要的
2.kafka producer
通过命令行事先准备了一个名为test的topic,然后实现producer往里面写数据。
在这里插入代码import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;
import java.util.Properties;
/**
* author: wanglei
* create: 2022-09-21
*/
public class Producer
public static void main(String[] args)
Properties properties = new Properties();
properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
KafkaProducer<String, String> producer = new KafkaProducer<String, String>(properties);
String topic = "test";
ProducerRecord record = new ProducerRecord(topic, "v1");
producer.send(record);
ProducerRecord record2 = new ProducerRecord(topic, "v2");
producer.send(record2);
producer.close();
稍微需要注意的是,序列化反序列化使用的类型,均为StringSerializer。
3.kafka consumer
先使用kafka client自己的consumer进行简单测试。
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.util.Arrays;
import java.util.Properties;
/**
* author: wanglei
* create: 2022-09-26
*/
public class Consumer
public static void main(String[] args)
String topic = "test";
String groupId = "group_leilei";
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.setProperty("enable.auto.commit", "true");//设置为自动提交
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList(topic));
while(true)
ConsumerRecords<String, String> records = consumer.poll(1L);
for(ConsumerRecord<String, String> record: records)
System.out.printf("patition = %d , offset = %d, key = %s, value = %s%n",
record.partition(), record.offset(), record.key(), record.value());
如果是要想从头开始消费,需要满足两个条件
1.一个新的groupid,之前未曾记录过offset。
2.设置参数auto.offset.reset为earliest。
4.使用flink相关API进行消费测试
下面我们使用flink相关代码进行消费测试。
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09;
import org.apache.flink.util.Collector;
import java.util.Properties;
/**
* author: wanglei
* create: 2022-09-21
*/
public class KafkaCount
public static final class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>>
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception
String[] tokens = value.toLowerCase().split("\\\\W+");
for(String token: tokens)
if (token.length() > 0)
out.collect(new Tuple2<>(token, 1));
public static void run() throws Exception
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
System.out.println("set kafka parameters!");
Properties props = new Properties();
String topic = "test";
props.setProperty("bootstrap.servers", "localhost:9092");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("group.id", topic);
FlinkKafkaConsumer09<String> myConsumer = new FlinkKafkaConsumer09<String>(topic, new SimpleStringSchema(), props);
// 从队列的最起始位置开始消费
myConsumer.setStartFromEarliest();
DataStream<String> stream = env.addSource(myConsumer);
DataStream<Tuple2<String, Integer>> counts = stream.flatMap(new LineSplitter())
.keyBy(0)
.sum(1);
counts.print();
env.execute("word count from kafka");
public static void main(String[] args) throws Exception
run();
以上是关于Flink+kafka实时统计本地环境搭建与代码实战的主要内容,如果未能解决你的问题,请参考以下文章