慕课网实战Spark Streaming实时流处理项目实战笔记十七之铭文升级版

Posted 2020-10-23 集技术与颜值于一身

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了慕课网实战Spark Streaming实时流处理项目实战笔记十七之铭文升级版相关的知识，希望对你有一定的参考价值。

铭文一级：

功能1：今天到现在为止实战课程的访问量

yyyyMMdd courseid

使用数据库来进行存储我们的统计结果
Spark Streaming把统计结果写入到数据库里面
可视化前端根据：yyyyMMdd courseid 把数据库里面的统计结果展示出来

选择什么数据库作为统计结果的存储呢？
RDBMS: mysql、Oracle...
day course_id click_count
20171111 1 10
20171111 2 10

下一个批次数据进来以后：
20171111 + 1 ==> click_count + 下一个批次的统计结果 ==> 写入到数据库中

NoSQL: HBase、Redis....
HBase：一个API就能搞定，非常方便
20171111 + 1 ==> click_count + 下一个批次的统计结果
本次课程为什么要选择HBase的一个原因所在

前提：
HDFS
Zookeeper
HBase

HBase表设计
创建表
create ‘imooc_course_clickcount‘, ‘info‘
Rowkey设计
day_courseid

如何使用Scala来操作HBase

铭文二级：

启动Hbase要先启动HDFS、ZooKeeper

Hadoop的启动，sbin文件夹：

./start-dfs.sh

HBase的启动，bin文件夹:

./start-hbase.sh

1、建表：create ‘imooc_course_clickcount‘,‘info‘

查看表：list

查看表详情：desc imooc_course_clickcount　　//desc ‘imooc_course_clickcount‘

2、Rowkey的设计：day_courseid

3、建CourseClickCount类（day_course,click_count）

4、HBaseUtils工具类的实现

package com.imooc.spark.project.utils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
/**
 * HBase操作工具类：Java工具类建议采用单例模式封装
 */
public class HBaseUtils {
    HBaseAdmin admin = null;
    Configuration configuration = null;
    /**
     * 私有改造方法
     */
    private HBaseUtils(){
        configuration = new Configuration();
        configuration.set("hbase.zookeeper.quorum", "hadoop000:2181");
        configuration.set("hbase.rootdir", "hdfs://hadoop000:8020/hbase");
        try {
            admin = new HBaseAdmin(configuration);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    private static HBaseUtils instance = null;
    public  static synchronized HBaseUtils getInstance() {
        if(null == instance) {
            instance = new HBaseUtils();
        }
        return instance;
    }
    /**
     * 根据表名获取到HTable实例
     */
    public HTable getTable(String tableName) {
        HTable table = null;
        try {
            table = new HTable(configuration, tableName);
        } catch (IOException e) {
            e.printStackTrace();
        }
        return table;
    }
    /**
     * 添加一条记录到HBase表
     * @param tableName HBase表名
     * @param rowkey  HBase表的rowkey
     * @param cf HBase表的columnfamily
     * @param column HBase表的列
     * @param value  写入HBase表的值
     */
    public void put(String tableName, String rowkey, String cf, String column, String value) {
        HTable table = getTable(tableName);
        Put put = new Put(Bytes.toBytes(rowkey));
        put.add(Bytes.toBytes(cf), Bytes.toBytes(column), Bytes.toBytes(value));
        try {
            table.put(put);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    public static void main(String[] args) {
        //HTable table = HBaseUtils.getInstance().getTable("imooc_course_clickcount");
        //System.out.println(table.getName().getNameAsString());
        String tableName = "imooc_course_clickcount" ;
        String rowkey = "20171111_88";
        String cf = "info" ;
        String column = "click_count";
        String value = "2";
        HBaseUtils.getInstance().put(tableName, rowkey, cf, column, value);
    }
}

以上是关于慕课网实战Spark Streaming实时流处理项目实战笔记十七之铭文升级版的主要内容，如果未能解决你的问题，请参考以下文章