数据湖(十九):SQL API 读取Kafka数据实时写入Iceberg表

Posted Lansonli

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了数据湖(十九):SQL API 读取Kafka数据实时写入Iceberg表相关的知识,希望对你有一定的参考价值。

文章目录

SQL API 读取Kafka数据实时写入Iceberg表

一、首先需要创建对应的Iceberg表

二、编写代码读取Kafka数据实时写入Iceberg


SQL API 读取Kafka数据实时写入Iceberg表

从Kafka中实时读取数据写入到Iceberg表中,操作步骤如下:

一、首先需要创建对应的Iceberg表

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tblEnv = StreamTableEnvironment.create(env);
env.enableCheckpointing(1000);
//1.创建Catalog
tblEnv.executeSql("CREATE CATALOG hadoop_iceberg WITH (" +
        "'type'='iceberg'," +
        "'catalog-type'='hadoop'," +
        "'warehouse'='hdfs://mycluster/flink_iceberg')");
//2.创建iceberg表 flink_iceberg_tbl
tblEnv.executeSql("create table hadoop_iceberg.iceberg_db.flink_iceberg_tbl3(id int,name string,age int,loc string) partitioned by (loc)");

二、编写代码读取Kafka数据实时写入Iceberg

public class ReadKafkaToIceberg 
    public static void main(String[] args) throws Exception 
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        StreamTableEnvironment tblEnv = StreamTableEnvironment.create(env);
        env.enableCheckpointing(1000);

        /**
         * 1.需要预先创建 Catalog 及Iceberg表
         */
        //1.创建Catalog
        tblEnv.executeSql("CREATE CATALOG hadoop_iceberg WITH (" +
                "'type'='iceberg'," +
                "'catalog-type'='hadoop'," +
                "'warehouse'='hdfs://mycluster/flink_iceberg')");

        //2.创建iceberg表 flink_iceberg_tbl
//        tblEnv.executeSql("create table hadoop_iceberg.iceberg_db.flink_iceberg_tbl3(id int,name string,age int,loc string) partitioned by (loc)");

        //3.创建 Kafka Connector,连接消费Kafka中数据
        tblEnv.executeSql("create table kafka_input_table(" +
                " id int," +
                " name varchar," +
                " age int," +
                " loc varchar" +
                ") with (" +
                " 'connector' = 'kafka'," +
                " 'topic' = 'flink-iceberg-topic'," +
                " 'properties.bootstrap.servers'='node1:9092,node2:9092,node3:9092'," +
                " 'scan.startup.mode'='latest-offset'," +
                " 'properties.group.id' = 'my-group-id'," +
                " 'format' = 'csv'" +
                ")");

        //4.配置 table.dynamic-table-options.enabled
        Configuration configuration = tblEnv.getConfig().getConfiguration();
        // 支持SQL语法中的 OPTIONS 选项
        configuration.setBoolean("table.dynamic-table-options.enabled", true);

        //5.写入数据到表 flink_iceberg_tbl3
        tblEnv.executeSql("insert into hadoop_iceberg.iceberg_db.flink_iceberg_tbl3 select id,name,age,loc from kafka_input_table");

        //6.查询表数据
        TableResult tableResult = tblEnv.executeSql("select * from hadoop_iceberg.iceberg_db.flink_iceberg_tbl3 /*+ OPTIONS('streaming'='true', 'monitor-interval'='1s')*/");
        tableResult.print();
    

启动以上代码,向Kafka topic中生产如下数据:

1,zs,18,beijing
2,ls,19,shanghai
3,ww,20,beijing
4,ml,21,shanghai

我们可以看到控制台上有对应实时数据输出,查看对应的Icberg HDFS目录,数据写入成功。


  • 📢博客主页:https://lansonli.blog.csdn.net
  • 📢欢迎点赞 👍 收藏 ⭐留言 📝 如有错误敬请指正!
  • 📢本文由 Lansonli 原创,首发于 CSDN博客🙉
  • 📢停下休息的时候不要忘了别人还在奔跑,希望大家抓紧时间学习,全力奔赴更美好的生活✨

以上是关于数据湖(十九):SQL API 读取Kafka数据实时写入Iceberg表的主要内容,如果未能解决你的问题,请参考以下文章

数据湖(十七):Flink与Iceberg整合DataStream API操作

数据湖(十六):Structured Streaming实时写入Iceberg

数据湖(十八):Flink与Iceberg整合SQL API操作

数据湖(十七):Flink与Iceberg整合DataStream API操作

云原生数据湖架构中的无服务器 Kafka

读取 kafka 主题并通过 Rest API 公开数据以供 prometheus 抓取(Nodejs)