Hadoop3集群搭建之——hive添加自定义函数UDTF (一行输入,多行输出)

Posted Flink菜鸟

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hadoop3集群搭建之——hive添加自定义函数UDTF (一行输入,多行输出)相关的知识,希望对你有一定的参考价值。

上篇:

Hadoop3集群搭建之——虚拟机安装

Hadoop3集群搭建之——安装hadoop,配置环境

Hadoop3集群搭建之——配置ntp服务

Hadoop3集群搭建之——hive安装

Hadoop3集群搭建之——hbase安装及简单操作

Hadoop3集群搭建之——hive添加自定义函数UDF

Hadoop3集群搭建之——hive添加自定义函数UDTF

 

上篇中,udtf函数,只有为一行输入,一行输出。udtf是可以一行输入,多行输出的。

简述下需求:  

输入开始时间,结束时间,返回每个小时的时长

直接上代码:

package com.venn.udtf;

import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;

import java.util.ArrayList;

/**
 * Created by venn on 5/20/2018.
 * SplitHour : split hour
 */
public class SplitHour extends GenericUDTF {

    /**
     * add the column name
     * @param args
     * @return
     * @throws UDFArgumentException
     */
    @Override
    public StructObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException {
        if (args.length != 1) {
            throw new UDFArgumentLengthException("ExplodeMap takes only one argument");
        }
        if (args[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {
            throw new UDFArgumentException("ExplodeMap takes string as a parameter");
        }

        ArrayList<String> fieldNames = new ArrayList<String>();
        ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
        fieldNames.add("begintime");
        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
        fieldNames.add("endtime");
        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
        fieldNames.add("hour");
        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
        fieldNames.add("seconds");
        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);


        return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);
    }

    /**
     * process the column
     * @param objects
     * @throws HiveException
     */
    public void process(Object[] objects) throws HiveException {


        String [] input = objects[0].toString().split(",");
        // 2018-06-06 10:25:35
        String beginTime = input[0];
        String endTime = input[1];

        String[] result = new String[4];
        result[0] = beginTime;
        result[1] = endTime;

        // begintime
        int bhour = Integer.parseInt(beginTime.substring(11, 13));
        int bmin = Integer.parseInt(beginTime.substring(14, 16));
        int bsecond = Integer.parseInt(beginTime.substring(17, 19));
        // endtime
        int ehour = Integer.parseInt(endTime.substring(11, 13));
        int emin = Integer.parseInt(endTime.substring(14, 16));
        int esecond = Integer.parseInt(endTime.substring(17, 19));

        // 1.if begin hour equal end hour, second is : (emin - bmin) * 60 + (esecond - bsecond)
        if (bhour == ehour) {
            result[2] = String.valueOf(bhour);
            result[3] = String.valueOf((emin - bmin) * 60 + (esecond - bsecond));
            forward(result);
            return;
        }

        boolean flag = true;
        //TODO 待优化,先输出第一个循环的时长,再循环后面的就不用判断
        while (bhour != ehour) {
            result[2] = String.valueOf(bhour);

            if(flag){
                flag = false;
            // 2. if begintime hour != endtime, the first hour, second is : 3600 - bmin * 60 - bsecond
                result[3] = String.valueOf(3600 - bmin * 60 - bsecond);
            }else {
                // 3. next hour is 3600
                result[3] = String.valueOf(3600);
            }
            bhour += 1;
            // 输出到hive
            forward(result);
        }

        result[2] = String.valueOf(bhour);
        // 4. the end hour is : emin  * 60 + esecond
        result[3] = String.valueOf( emin  * 60 + esecond);
        forward(result);

    }

    public void close() throws HiveException {

    }

}

 

udtf 函数介绍参加上篇

使用方式见上篇

Hadoop3集群搭建之——hive添加自定义函数UDTF

 

样例:

hive> select split_hour( concat(begintime,\',\',endtime)) from viewlog where log_date=20180401 limit 10;
OK
begintime    endtime    hour    seconds
2018-04-01 10:26:14    2018-04-01 10:26:21    10    7
2018-04-01 07:21:47    2018-04-01 07:22:23    7    36
2018-04-01 15:18:08    2018-04-01 15:18:11    15    3
2018-04-01 18:05:13    2018-04-01 18:05:28    18    15
2018-04-01 07:18:34    2018-04-01 07:18:52    7    18
2018-04-01 23:28:32    2018-04-01 23:29:44    23    72
2018-04-01 06:34:11    2018-04-01 06:34:17    6    6
2018-04-01 14:02:40    2018-04-01 14:03:33    14    53
2018-04-01 17:30:23    2018-04-01 17:30:26    17    3
2018-04-01 12:15:07    2018-04-01 12:15:11    12    4
2018-04-01 06:53:40    2018-04-01 07:02:09    6    380
2018-04-01 06:53:40    2018-04-01 07:02:09    7    129
Time taken: 2.238 seconds, Fetched: 12 row(s)

搞定

 

以上是关于Hadoop3集群搭建之——hive添加自定义函数UDTF (一行输入,多行输出)的主要内容,如果未能解决你的问题,请参考以下文章

Hadoop3集群搭建之——hive添加自定义函数UDFUDTF

Hadoop3集群搭建之——hbase安装及简单操作

基于Hadoop3.1.2集群的Hive3.1.2安装(有不少坑)

hadoop集群搭建(Hadoop 3.1.3 /Hive 3.1.2/Spark 3.0.0)

纯手动搭建大数据集群架构_记录018_RuoYi-Cloud-Plus-master_Kafka集成的自己创建的微服务_实现多主题数据传输---大数据之Hadoop3.x工作笔记0179

Red Hat Enterprise 8.4 Install Hadoop3+Hive3集群部署