Hive基础

Posted 2023-03-10 思达滴

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Hive基础相关的知识，希望对你有一定的参考价值。

hive基本语法：

查看数据库：hive (default)> show databases; -----查看所有数据库

hive (default)> desc database test; ----查看数据库结构

hive (default)> select current_database(); ---查看当前数据库

创建数据库：hive (default)> create database test;

删除数据库：hive (default)> drop database if exists test;

强制删除数据库：hive (default)> drop database if exists test cascade;

创建表：hive (default)> create table student (id int,name string);

删除表：hive (default)> drop table if exists student;

添加数据：hive (default)> insert into student values(1,'zs'),(2,'ls');

查看表结构信息：hive (default)> desc student;

查看表数据：hive (default)> select * from student;

hive数据类型----集合数据类型

ARRAY：存储的数据为相同类型

MAP：具有相同类型的键值对

STRUCT：封装了一组字段

hive数据结构

数据结构	描述	逻辑关系	物理存储（HFDS）
Database	数据库	表的集合	文件夹
Table	表	行数据的集合	文件夹
Partition	分区	用于分割数据	文件夹
Buckets	分桶	用于分布数据	文件
Row	行	行记录	文件中的行
Columns	列	列记录	每行中指定的位置
Views	视图	逻辑概念，可跨越多张表	不存储数据
Index	索引	记录统计数据信息	文件夹

案例

在opt目录下创建一个employee.txt文件，把下面数据加到文件中

数据：

Michael|Montreal,Toronto|Male,30|DB:80|Product:Developer Lead
Will|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Shelley|New York|Female,27|Python:80|Test:Lead,COE:Architect
Lucy|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead

建库：

create table if not exists employee(
    name string,
    work_place array<string>,
    gender_age struct<gender:string,age:int>,
    skills_score map<string,int>,
    depart_title map<string,string>
)
row format delimited fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\\n';

注：
row format delimited 表示分隔符设置开始语句
fields terminated by '|' 表示字段与字段之间按照“|”分隔
collection items terminated by ',' 表示一个复杂类型（array,struct)字段的各个item之间按照 “,”分隔
map keys terminated by ':' 表示复杂类型(Map)字段的key value之间按照 “:”分隔
lines terminated by '\\n'; 表示行与行之间按照 “\\n”分隔

加载文件：

方法一本地文件(本质是hadoop dfs -put上传操作，复制)：

在hive中输入 load data local inpath '/opt/employee.txt' into table employee;

方法二hdfs文件(本质是hadoop fs -mv 操作，移动)：

在hive中输入load data inpath '/employee.txt' into table employee;

查询数据：select * from employee;

创建分区表

按照age分区

create table employee2(
    name string,
    work_place array<string>,
    gender_age struct<gender:string,age:int>,
    skills_score map<string,int>,
    depart_title map<string,string>
)
partitioned by (age int)
row format delimited 
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\\n';

插入数据，按照age=20/age=30分区

 load data local inpath '/opt/employee.txt' into table employee2 partition(age=20);
 load data local inpath '/opt/employee.txt' into table employee2 partition(age=30);

查看分区表信息：show partitions employee2;

内部表和外部表

内部表(管理表)：

数据完全由Hive管理，删除表(元数据)会删除数据

外部表(External Tables):

数据保存在指定位置的HDFS路径中

Hive不完全管理数据，删除表(元数据)不会删除数据

上传数据内容：

hdfs dfs -put ./employee.txt /tmp/hivedata/employee/

创建外部表

create external table if not exists employee(
    name string,
    work_place array<string>,
    gender_age struct<gender:string,age:int>,
    skills_score map<string,int>,
    depart_title map<string,string>
)
row format delimited 
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\\n
location '/tmp/hivedata/employee';

注：

创建外部表要在create后面加上一个 external

location '/tmp/hivedata/employee'; 含义是：指定数据存储路径（HDFS）

以上是关于Hive基础的主要内容，如果未能解决你的问题，请参考以下文章

Hive入门学习--Hadoop简介

HIVE了解及SQL基础命令

hadoop完全分布式模式搭建和hive安装

hive

大数据基础之Hive—— Hive概述

Hive sql大数据有道之Hive sql去重