Flink+kafka实时统计本地环境搭建与代码实战

Posted bitcarmanlee

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Flink+kafka实时统计本地环境搭建与代码实战相关的知识,希望对你有一定的参考价值。

1.搭建zookeeper与kafka本地环境

flink经常用来消费上游kafka的数据,而kafka又依赖zookeeper进行。所以在进行测试之前,先要准备好本地的zookeeper与kafka环境。

关于准备zk与kafka环境,具体可以参考SparkStreaming kafka zookeeper本地环境调试安装

2.添加所需要的依赖

在pom.xml文件中,先添加所需要的依赖。主要包括flink相关的依赖已经kafka相关的依赖,flink版本1.7.2,kafka相关版本0.9。

    <properties>
        <flink.version>1.7.2</flink.version>
    </properties>


    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>$flink.version</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.11</artifactId>
            <version>$flink.version</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.11</artifactId>
            <version>$flink.version</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka-0.9_2.11</artifactId>
            <version>$flink.version</version>
        </dependency>
    </dependencies>

主要的

2.kafka producer

通过命令行事先准备了一个名为test的topic,然后实现producer往里面写数据。

在这里插入代码import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

/**
 * author: wanglei
 * create: 2022-09-21
 */
public class Producer 
    public static void main(String[] args) 
        Properties properties = new Properties();
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        KafkaProducer<String, String> producer = new KafkaProducer<String, String>(properties);
        String topic = "test";
        ProducerRecord record = new ProducerRecord(topic, "v1");
        producer.send(record);
        ProducerRecord record2 = new ProducerRecord(topic, "v2");
        producer.send(record2);
        producer.close();
    

稍微需要注意的是,序列化反序列化使用的类型,均为StringSerializer。

3.kafka consumer

先使用kafka client自己的consumer进行简单测试。

import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.util.Arrays;
import java.util.Properties;

/**
 * author: wanglei
 * create: 2022-09-26
 */
public class Consumer 

    public static void main(String[] args) 
        String topic = "test";
        String groupId = "group_leilei";
        Properties props = new Properties();

        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
        props.put("auto.commit.interval.ms", "1000");
        props.put("session.timeout.ms", "30000");
        props.setProperty("enable.auto.commit", "true");//设置为自动提交
        KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
        consumer.subscribe(Arrays.asList(topic));

        while(true) 
            ConsumerRecords<String, String> records = consumer.poll(1L);
            for(ConsumerRecord<String, String> record: records) 
                System.out.printf("patition = %d , offset = %d, key = %s, value = %s%n",
                        record.partition(), record.offset(), record.key(), record.value());
            
        
    

如果是要想从头开始消费,需要满足两个条件
1.一个新的groupid,之前未曾记录过offset。
2.设置参数auto.offset.reset为earliest。

4.使用flink相关API进行消费测试

下面我们使用flink相关代码进行消费测试。

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09;
import org.apache.flink.util.Collector;

import java.util.Properties;

/**
 * author: wanglei
 * create: 2022-09-21
 */
public class KafkaCount 


    public static final class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> 

        @Override
        public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception 
            String[] tokens = value.toLowerCase().split("\\\\W+");
            for(String token: tokens) 
                if (token.length() > 0) 
                    out.collect(new Tuple2<>(token, 1));
                
            
        
    

    public static void run() throws Exception 
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        System.out.println("set kafka parameters!");
        Properties props = new Properties();
        String topic = "test";
        props.setProperty("bootstrap.servers", "localhost:9092");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("group.id", topic);

        FlinkKafkaConsumer09<String> myConsumer = new FlinkKafkaConsumer09<String>(topic, new SimpleStringSchema(), props);
        // 从队列的最起始位置开始消费
        myConsumer.setStartFromEarliest();
        DataStream<String> stream = env.addSource(myConsumer);
        DataStream<Tuple2<String, Integer>> counts = stream.flatMap(new LineSplitter())
                .keyBy(0)
                .sum(1);

        counts.print();
        env.execute("word count from kafka");
    

    public static void main(String[] args) throws Exception 
        run();
    

以上是关于Flink+kafka实时统计本地环境搭建与代码实战的主要内容,如果未能解决你的问题,请参考以下文章

Flink从零搭建实时数据分析系统

指标统计:基于流计算 Oceanus(Flink) 实现实时 UVPV 统计

flink模拟项目:实时热门商品统计

Kafka与Flink集成

实时数仓Flink生产环境部署+提交作业步骤

实时数仓Flink生产环境部署+提交作业步骤