Hive：数据定义语言（DDL）案例

Posted 2023-02-16 不死鸟.亚历山大.狼崽子

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Hive：数据定义语言（DDL）案例相关的知识，希望对你有一定的参考价值。

1 原生数据类型案例

文件archer.txt中记录了相关信息，内容如下所示，其中字段之间分隔符为制表符\\t,要求在Hive中建表映射成功该文件。

1	后羿	5986	1784	396	336	remotely	archer
2	马可波罗	5584	200	362	344	remotely	archer
3	鲁班七号	5989	1756	400	323	remotely	archer
4	李元芳	5725	1770	396	340	remotely	archer
5	孙尚香	6014	1756	411	346	remotely	archer
6	黄忠	5898	1784	403	319	remotely	archer
7	狄仁杰	5710	1770	376	338	remotely	archer
8	虞姬	5669	1770	407	329	remotely	archer
9	成吉思汗	5799	1742	394	329	remotely	archer
10	百里守约	5611	1784	410	329	remotely	archer	assassin

字段含义：id、name（英雄名称）、hp_max（最大生命）、mp_max（最大法力）、attack_max（最高物攻）、defense_max（最大物防）、attack_range（攻击范围）、role_main（主要定位）、role_assist（次要定位）。

分析一下：字段都是基本类型，字段的顺序需要注意一下。字段之间的分隔符是制表符，需要使用row format语法进行指定。

建表语句：

--创建数据库并切换使用
create database handsome;
use handsome;

--ddl create table
create table t_archer(
    id int comment "ID",
    name string comment "英雄名称",
    hp_max int comment "最大生命",
    mp_max int comment "最大法力",
    attack_max int comment "最高物攻",
    defense_max int comment "最大物防",
    attack_range string comment "攻击范围",
    role_main string comment "主要定位",
    role_assist string comment "次要定位"
) comment "王者荣耀射手信息"
row format delimited fields terminated by "\\t";

建表成功之后，在Hive的默认存储路径下就生成了表对应的文件夹，把archer.txt文件上传到对应的表文件夹下。

hadoop fs -put archer.txt  /user/hive/warehouse/handsome.db/t_archer

执行查询操作，可以看出数据已经映射成功。

想一想：Hive这种能力是不是比mysql一条一条insert插入数据方便多了？

2 复杂数据类型案例

文件hot_hero_skin_price.txt中记录了手游《王者荣耀》热门英雄的相关皮肤价格信息，内容如下,要求在Hive中建表映射成功该文件。

1,孙悟空,53,西部大镖客:288-大圣娶亲:888-全息碎片:0-至尊宝:888-地狱火:1688
2,鲁班七号,54,木偶奇遇记:288-福禄兄弟:288-黑桃队长:60-电玩小子:2288-星空梦想:0
3,后裔,53,精灵王:288-阿尔法小队:588-辉光之辰:888-黄金射手座:1688-如梦令:1314
4,铠,52,龙域领主:288-曙光守护者:1776
5,韩信,52,飞衡:1788-逐梦之影:888-白龙吟:1188-教廷特使:0-街头霸王:888

字段：id、name（英雄名称）、win_rate（胜率）、skin_price（皮肤及价格）

分析一下：前3个字段原生数据类型、最后一个字段复杂类型map。需要指定字段之间分隔符、集合元素之间分隔符、map kv之间分隔符。

建表语句：

create table t_hot_hero_skin_price(
    id int,
    name string,
    win_rate int,
    skin_price map<string,int>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':' ;

建表成功后，把hot_hero_skin_price.txt文件上传到对应的表文件夹下。

hadoop fs -put hot_hero_skin_price.txt  /user/hive/warehouse/handsome.db/t_hot_hero_skin_price

执行查询操作，可以看出数据已经映射成功。

select * from t_hot_hero_skin_price;

想一想：如果最后一个字段以String类型来定义，后续使用方便吗？

3 默认分隔符案例

文件team_ace_player.txt中记录了手游《王者荣耀》主要战队内最受欢迎的王牌选手信息，内容如下,要求在Hive中建表映射成功该文件。

1^A成都AG超玩会^A一诺
2^A重庆QGhappy^AHurt
3^ADYG^A久诚
4^A上海EDG.M^A浪浪
5^A武汉eStarPro^ACat
6^ARNG.M^A暴风锐
7^ARW侠^A渡劫
8^ATES滔搏^A迷神
9^A杭州LGD大鹅^A伪装
10^A南京Hero久竞^A清融

字段：id、team_name（战队名称）、ace_player_name（王牌选手名字）

分析一下：数据都是原生数据类型，且字段之间分隔符是\\001，因此在建表的时候可以省去row format语句，因为hive默认的分隔符就是\\001。

建表语句：

create table t_team_ace_player(
    id int,
    team_name string,
    ace_player_name string
);

建表成功后，把team_ace_player.txt文件上传到对应的表文件夹下。

hadoop fs -put team_ace_player.txt  /user/hive/warehouse/handsome.db/t_team_ace_player

执行查询操作，可以看出数据已经映射成功。

select * from t_team_ace_player;

想一想：字段以\\001分隔建表时很方便，那么采集、清洗数据时对数据格式追求有什么启发？你青睐于什么分隔符？

4 指定数据存储路径

文件team_ace_player.txt中记录了手游《王者荣耀》主要战队内最受欢迎的王牌选手信息，字段之间使用的是\\001作为分隔符。

要求把文件上传到HDFS任意路径下，不能移动复制，并在Hive中建表映射成功该文件。

create table t_team_ace_player_location(
id int,
team_name string,
ace_player_name string)
location ‘/tmp’;

以上是关于Hive：数据定义语言（DDL）案例的主要内容，如果未能解决你的问题，请参考以下文章