大数据--hiveDML命令操作

Posted jeff190812

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了大数据--hiveDML命令操作相关的知识,希望对你有一定的参考价值。

1、分区表管理

1.1、创建分区表

hive (db_test)> create table dept_partition(deptno int, dname string, loc string)
> partitioned by(month string)
> row format delimited fields terminated by ‘\t‘;
OK
Time taken: 0.266 seconds

-------------------------------------------------------------------------------------------------------

1.2、往分区表里面导入数据

[root@bigdata113 hivetest]# cat dept
10 ACCOUNTING 1700
20 RESEARCH 1800
30 SALES 1900
40 OPERATIONS 1700

 

hive (db_test)> load data local inpath ‘/root/hivetest/dept‘ into table dept_partition partition(month=‘201909‘);
Loading data to table db_test.dept_partition partition (month=201909)
Partition db_test.dept_partitionmonth=201909 stats: [numFiles=1, numRows=0, totalSize=69, rawDataSize=0]
OK
Time taken: 1.057 seconds

----------------------------------------------------------------------------------------------------------

1.3、按照月份查询分区表的数据

hive (db_test)> select * from dept_partition where month=‘201910‘;
OK
dept_partition.deptno dept_partition.dname dept_partition.loc dept_partition.month
10 ACCOUNTING 1700 201910
20 RESEARCH 1800 201910
30 SALES 1900 201910
40 OPERATIONS 1700 201910
Time taken: 0.436 seconds, Fetched: 4 row(s)

----------------------------------------------------------------------------------------------------------------

1.4、新增单个分区

hive (db_test)> alter table dept_partition add partition(month=‘201911‘);
OK
Time taken: 0.107 seconds

----------------------------------------------------------------------------------------------------------------------------------------

1.5、新增多个分区,分区间用空格隔开就好了;

hive (db_test)> alter table dept_partition add partition(month=‘201912‘) partition(month=‘202001‘);
OK
Time taken: 0.111 seconds

--------------------------------------------------------------------------------------------------------------------------------------------------

1.6、查询分区表有多少分区

hive (db_test)> show partitions dept_partition;
OK
partition
month=201909
month=201910
month=201911
month=201912
month=202001
Time taken: 0.048 seconds, Fetched: 5 row(s)

---------------------------------------------------------------------------------------------------------------------------------------------------

1.7、查询表结构

hive (db_test)> desc formatted dept_partition;
OK
col_name data_type comment
# col_name data_type comment

deptno int
dname string
loc string

# Partition Information
# col_name data_type comment

month string

# Detailed Table Information
Database: db_test
Owner: root
CreateTime: Sat Sep 14 15:26:47 CST 2019
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://mycluster/db_test.db/dept_partition
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1568446007

# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \t
serialization.format \t
Time taken: 0.078 seconds, Fetched: 34 row(s)

---------------------------------------------------------------------------------------------------------------------------

1.8、删除单个分区

hive (db_test)> alter table dept_partition drop partition(month=‘202001‘);
Dropped the partition month=202001
OK
Time taken: 0.102 seconds

-----------------------------------------------------------------------------------------------------------------------------

1.9、删除多个分区

hive (db_test)> alter table dept_partition drop partition(month=‘201911‘),partition(month=‘201912‘);
Dropped the partition month=201911
Dropped the partition month=201912
OK
Time taken: 0.14 seconds

---------------------------------------------------------------------------------------------------------------------------------------

2.0、创建二级分区表

hive (db_test)> create table dept_deep_partition(deptno int, dname string, loc string)
> partitioned by(month string,day string)
> row format delimited fields terminated by ‘\t‘;
OK
Time taken: 0.081 seconds

------------------------------------------------------------------------------------------------------------------------------------------

2.1、加载数据到二级分区表

hive (db_test)> load data local inpath ‘/root/hivetest/dept‘ into table dept_deep_partition partition(month=‘201909‘,day=‘14‘);
Loading data to table db_test.dept_deep_partition partition (month=201909, day=14)
Partition db_test.dept_deep_partitionmonth=201909, day=14 stats: [numFiles=1, numRows=0, totalSize=69, rawDataSize=0]
OK
Time taken: 0.361 seconds

------------------------------------------------------------------------------------------------------------------------------------------------

2.2、查询二级分区表的数据

hive (db_test)> select * from dept_deep_partition where month=‘201909‘ and day=‘14‘;
OK
dept_deep_partition.deptno dept_deep_partition.dname dept_deep_partition.loc dept_deep_partition.month dept_deep_partition.day
10 ACCOUNTING 1700 201909 14
20 RESEARCH 1800 201909 14
30 SALES 1900 201909 14
40 OPERATIONS 1700 201909 14
Time taken: 0.085 seconds, Fetched: 4 row(s)

===========================================================================================================

2、表管理

2.1、重命名表名

hive (db_test)> alter table test rename to test1;
OK
Time taken: 0.1 seconds
hive (db_test)> desc test1;
OK
col_name data_type comment
id int
name string
Time taken: 0.053 seconds, Fetched: 2 row(s)

-----------------------------------------------------------------------------------------------------------------------------------------------------

2.2、给表新增一列

hive (db_test)> alter table test1 add columns(sex int);
OK
Time taken: 0.118 seconds
hive (db_test)> desc test1;
OK
col_name data_type comment
id int
name string
sex int
Time taken: 0.037 seconds, Fetched: 3 row(s)

------------------------------------------------------------------------------------------------------------------------------------------------------

2.3、更新列

hive (db_test)> alter table test1 change column sex sexs string;
OK
Time taken: 0.096 seconds
hive (db_test)> desc test1;
OK
col_name data_type comment
id int
name string
sexs string
Time taken: 0.043 seconds, Fetched: 3 row(s)

-------------------------------------------------------------------------------------------------------------------------------------------------------

2.4、更新该表的所有列

hive (db_test)> alter table test1 replace columns(id string,name string,sexs int);
OK
Time taken: 0.072 seconds
hive (db_test)> desc test1;
OK
col_name data_type comment
id string
name string
sexs int
Time taken: 0.028 seconds, Fetched: 3 row(s)

=========================================================================================

3、数据导入

3.1、导入HDFS上的文件覆盖表数据

hive (db_test)> load data inpath ‘/data‘ overwrite into table test;
Loading data to table db_test.test
Table db_test.test stats: [numFiles=1, numRows=0, totalSize=39, rawDataSize=0]
OK
Time taken: 0.553 seconds
hive (db_test)> select * from test;
OK
test.id test.name
1001 zhangshan
1002 lishi
1003 zhaoliu
Time taken: 0.258 seconds, Fetched: 3 row(s)

--------------------------------------------------------------------------------------------------------------------------------------------

3.2、往分区表中插入记录(底层还是需要跑mapreduce的作业的)

hive (db_test)> insert into table student partition(month=‘201909‘) values(1000,‘wulei‘);
Query ID = root_20190914162525_05b98ef9-c040-44e3-a5d4-5cbd5a169995
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there‘s no reduce operator
Starting Job = job_1568445208318_0001, Tracking URL = http://bigdata112:8088/proxy/application_1568445208318_0001/
Kill Command = /opt/module/hadoop-2.8.4/bin/hadoop job -kill job_1568445208318_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-09-14 16:25:39,795 Stage-1 map = 0%, reduce = 0%
2019-09-14 16:25:50,234 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.99 sec
MapReduce Total cumulative CPU time: 990 msec
Ended Job = job_1568445208318_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://mycluster/db_test.db/student/month=201909/.hive-staging_hive_2019-09-14_16-25-25_082_2695322323999370786-1/-ext-10000
Loading data to table db_test.student partition (month=201909)
Partition db_test.studentmonth=201909 stats: [numFiles=1, numRows=1, totalSize=11, rawDataSize=10]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 0.99 sec HDFS Read: 3592 HDFS Write: 95 SUCCESS
Total MapReduce CPU Time Spent: 990 msec
OK
_col0 _col1
Time taken: 26.6 seconds

----------------------------------------------------------------------------------------------------------------------------------

3.3、基本模式插入(根据单张表的查询结果)

hive (db_test)> insert overwrite table student partition(month=‘201908‘)
> select id,name from student where month=‘201909‘;
Query ID = root_20190914162926_81f43769-e359-491a-8786-2a2181c0d578
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there‘s no reduce operator
Starting Job = job_1568445208318_0002, Tracking URL = http://bigdata112:8088/proxy/application_1568445208318_0002/
Kill Command = /opt/module/hadoop-2.8.4/bin/hadoop job -kill job_1568445208318_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-09-14 16:29:34,763 Stage-1 map = 0%, reduce = 0%
2019-09-14 16:29:43,317 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.82 sec
MapReduce Total cumulative CPU time: 820 msec
Ended Job = job_1568445208318_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://mycluster/db_test.db/student/month=201908/.hive-staging_hive_2019-09-14_16-29-26_525_836771285321484473-1/-ext-10000
Loading data to table db_test.student partition (month=201908)
Partition db_test.studentmonth=201908 stats: [numFiles=1, numRows=1, totalSize=11, rawDataSize=10]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 0.82 sec HDFS Read: 3494 HDFS Write: 95 SUCCESS
Total MapReduce CPU Time Spent: 820 msec
OK
id name
Time taken: 18.126 seconds

-------------------------------------------------------------------------------------------------------------------------------------------------------------------

3.4、多插入模式

hive (db_test)> from student
> insert overwrite table student partition(month=‘201901‘)
> select id,name where month = ‘201908‘
> insert overwrite table student partition(month=‘201902‘)
> select id,name where month = ‘201909‘;
Query ID = root_20190914163425_7314be5a-651b-4400-9ed5-1af8ea4671d6
Total jobs = 5
Launching Job 1 out of 5
Number of reduce tasks is set to 0 since there‘s no reduce operator
Starting Job = job_1568445208318_0003, Tracking URL = http://bigdata112:8088/proxy/application_1568445208318_0003/
Kill Command = /opt/module/hadoop-2.8.4/bin/hadoop job -kill job_1568445208318_0003
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
2019-09-14 16:34:32,599 Stage-2 map = 0%, reduce = 0%
2019-09-14 16:34:41,057 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.49 sec
MapReduce Total cumulative CPU time: 1 seconds 490 msec
Ended Job = job_1568445208318_0003
Stage-5 is selected by condition resolver.
Stage-4 is filtered out by condition resolver.
Stage-6 is filtered out by condition resolver.
Stage-11 is selected by condition resolver.
Stage-10 is filtered out by condition resolver.
Stage-12 is filtered out by condition resolver.
Moving data to: hdfs://mycluster/db_test.db/student/month=201901/.hive-staging_hive_2019-09-14_16-34-25_290_2442176926196003235-1/-ext-10000
Moving data to: hdfs://mycluster/db_test.db/student/month=201902/.hive-staging_hive_2019-09-14_16-34-25_290_2442176926196003235-1/-ext-10002
Loading data to table db_test.student partition (month=201901)
Loading data to table db_test.student partition (month=201902)
Partition db_test.studentmonth=201901 stats: [numFiles=1, numRows=0, totalSize=11, rawDataSize=0]
Partition db_test.studentmonth=201902 stats: [numFiles=1, numRows=0, totalSize=11, rawDataSize=0]
MapReduce Jobs Launched:
Stage-Stage-2: Map: 1 Cumulative CPU: 1.49 sec HDFS Read: 5264 HDFS Write: 190 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 490 msec
OK
id name
Time taken: 17.446 seconds

------------------------------------------------------------------------------------------------------------------------------------------------

3.5、查询语句中创建表并加载数据

hive (db_test)> create table if not exists student1
> as select id,name from student;
Query ID = root_20190914163710_cbb5e6f6-c87d-41f7-8a25-d0e541258c59
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there‘s no reduce operator
Starting Job = job_1568445208318_0004, Tracking URL = http://bigdata112:8088/proxy/application_1568445208318_0004/
Kill Command = /opt/module/hadoop-2.8.4/bin/hadoop job -kill job_1568445208318_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-09-14 16:37:22,863 Stage-1 map = 0%, reduce = 0%
2019-09-14 16:37:31,390 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.96 sec
MapReduce Total cumulative CPU time: 960 msec
Ended Job = job_1568445208318_0004
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://mycluster/db_test.db/.hive-staging_hive_2019-09-14_16-37-10_353_7162633994902445002-1/-ext-10001
Moving data to: hdfs://mycluster/db_test.db/student1
Table db_test.student1 stats: [numFiles=1, numRows=4, totalSize=44, rawDataSize=40]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 0.96 sec HDFS Read: 4317 HDFS Write: 116 SUCCESS
Total MapReduce CPU Time Spent: 960 msec
OK
id name
Time taken: 22.237 seconds
hive (db_test)> select * from student1
> ;
OK
student1.id student1.name
1000 wulei
1000 wulei
1000 wulei
1000 wulei
Time taken: 0.038 seconds, Fetched: 4 row(s)

--------------------------------------------------------------------------------------------------------------------------------------------

3.6、使用import语句导入指定hive表中(前提是先export表的数据到hdfs指定目录)

hive (db_test)> export table student to ‘/zyj/student‘;
Copying data from file:/tmp/root/8a4cf4d9-0268-49e0-aaf9-68d703fc5f51/hive_2019-09-14_16-42-04_554_538197640239560802-1/-local-10000/_metadata
Copying file: file:/tmp/root/8a4cf4d9-0268-49e0-aaf9-68d703fc5f51/hive_2019-09-14_16-42-04_554_538197640239560802-1/-local-10000/_metadata
Copying data from hdfs://mycluster/db_test.db/student/month=201901
Copying file: hdfs://mycluster/db_test.db/student/month=201901/000000_0
Copying data from hdfs://mycluster/db_test.db/student/month=201902
Copying file: hdfs://mycluster/db_test.db/student/month=201902/000000_0
Copying data from hdfs://mycluster/db_test.db/student/month=201908
Copying file: hdfs://mycluster/db_test.db/student/month=201908/000000_0
Copying data from hdfs://mycluster/db_test.db/student/month=201909
Copying file: hdfs://mycluster/db_test.db/student/month=201909/000000_0
OK
Time taken: 0.555 seconds

 

hive (db_test)> create table student3(id int,name string)
> partitioned by(month string)
> row format delimited fields terminated by ‘\t‘;
OK
Time taken: 0.072 seconds
hive (db_test)> import table student5 from ‘/zyj/student‘;
Copying data from hdfs://mycluster/zyj/student/month=201901
Copying file: hdfs://mycluster/zyj/student/month=201901/000000_0
Copying data from hdfs://mycluster/zyj/student/month=201902
Copying file: hdfs://mycluster/zyj/student/month=201902/000000_0
Copying data from hdfs://mycluster/zyj/student/month=201908
Copying file: hdfs://mycluster/zyj/student/month=201908/000000_0
Copying data from hdfs://mycluster/zyj/student/month=201909
Copying file: hdfs://mycluster/zyj/student/month=201909/000000_0
Loading data to table db_test.student5 partition (month=201901)
Loading data to table db_test.student5 partition (month=201902)
Loading data to table db_test.student5 partition (month=201908)
Loading data to table db_test.student5 partition (month=201909)
OK
Time taken: 1.034 seconds

 

hive (db_test)> select * from student5;
OK
student5.id student5.name student5.month
1000 wulei 201901
1000 wulei 201902
1000 wulei 201908
1000 wulei 201909
Time taken: 0.04 seconds, Fetched: 4 row(s)

---------------------------------------------------------------------------------------------------------------

3.7、数据导出到本地,insert 导出

hive (db_test)> insert overwrite local directory ‘/root/hivetest/student5‘
> select * from student5;
Query ID = root_20190914164638_739b1eb7-3a45-46ad-a2c3-883e1aa72d97
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there‘s no reduce operator
Starting Job = job_1568445208318_0005, Tracking URL = http://bigdata112:8088/proxy/application_1568445208318_0005/
Kill Command = /opt/module/hadoop-2.8.4/bin/hadoop job -kill job_1568445208318_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-09-14 16:46:44,599 Stage-1 map = 0%, reduce = 0%
2019-09-14 16:46:50,893 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.67 sec
MapReduce Total cumulative CPU time: 670 msec
Ended Job = job_1568445208318_0005
Copying data to local directory /root/hivetest/student5
Copying data to local directory /root/hivetest/student5
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 0.67 sec HDFS Read: 4418 HDFS Write: 72 SUCCESS
Total MapReduce CPU Time Spent: 670 msec
OK
student5.id student5.name student5.month
Time taken: 13.518 seconds

hive (db_test)> !cat /root/hivetest/student5/000000_0
> ;
1000wulei201901
1000wulei201902
1000wulei201908
1000wulei201909

----------------------------------------------------------------------------------------------------------------------

3.8、使用insert语句导出数据到本地并且格式化输出

hive (db_test)> insert overwrite local directory ‘/root/hivetest/student6‘
> row format delimited fields terminated by ‘\t‘
> select * from student5;
Query ID = root_20190914165232_86cb3925-445a-425b-845a-c8fb1a414503
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there‘s no reduce operator
Starting Job = job_1568445208318_0006, Tracking URL = http://bigdata112:8088/proxy/application_1568445208318_0006/
Kill Command = /opt/module/hadoop-2.8.4/bin/hadoop job -kill job_1568445208318_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-09-14 16:52:38,234 Stage-1 map = 0%, reduce = 0%
2019-09-14 16:52:43,551 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.57 sec
MapReduce Total cumulative CPU time: 570 msec
Ended Job = job_1568445208318_0006
Copying data to local directory /root/hivetest/student6
Copying data to local directory /root/hivetest/student6
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 0.57 sec HDFS Read: 4433 HDFS Write: 72 SUCCESS
Total MapReduce CPU Time Spent: 570 msec
OK
student5.id student5.name student5.month
Time taken: 12.42 seconds
hive (db_test)> !cat /root/hivetest/student6/000000_0;
1000 wulei 201901
1000 wulei 201902
1000 wulei 201908
1000 wulei 201909

-------------------------------------------------------------------------------------------------------------------------

3.9、使用insert语句导出数据到HDFS并且格式化输出

hive (db_test)> insert overwrite directory ‘/zyj/student5‘
> row format delimited fields terminated by ‘\t‘
> select * from student5;
Query ID = root_20190914165706_1f20c44f-09e4-4883-ab01-fd6b996b349d
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there‘s no reduce operator
Starting Job = job_1568445208318_0007, Tracking URL = http://bigdata112:8088/proxy/application_1568445208318_0007/
Kill Command = /opt/module/hadoop-2.8.4/bin/hadoop job -kill job_1568445208318_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-09-14 16:57:12,663 Stage-1 map = 0%, reduce = 0%
2019-09-14 16:57:18,908 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.7 sec
MapReduce Total cumulative CPU time: 700 msec
Ended Job = job_1568445208318_0007
Stage-3 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Stage-4 is filtered out by condition resolver.
Moving data to: hdfs://mycluster/zyj/student5/.hive-staging_hive_2019-09-14_16-57-06_486_7937228233036635922-1/-ext-10000
Moving data to: /zyj/student5
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 0.7 sec HDFS Read: 4387 HDFS Write: 72 SUCCESS
Total MapReduce CPU Time Spent: 700 msec
OK
student5.id student5.name student5.month
Time taken: 13.532 seconds
hive (db_test)> dfs -cat /zyj/student5/000000_0 hdfs://bigdata111:9000/;
1000 wulei 201901
1000 wulei 201902
1000 wulei 201908
1000 wulei 201909

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

4.0、使用export导出到HDFS上

hive (db_test)> export table student5 to ‘/zyjtest/student5‘; --这个路径一定是HDFS上面不存在的,否则会报错
Copying data from file:/tmp/root/8a4cf4d9-0268-49e0-aaf9-68d703fc5f51/hive_2019-09-14_17-01-48_551_984651952044192464-1/-local-10000/_metadata
Copying file: file:/tmp/root/8a4cf4d9-0268-49e0-aaf9-68d703fc5f51/hive_2019-09-14_17-01-48_551_984651952044192464-1/-local-10000/_metadata
Copying data from hdfs://mycluster/db_test.db/student5/month=201901
Copying file: hdfs://mycluster/db_test.db/student5/month=201901/000000_0
Copying data from hdfs://mycluster/db_test.db/student5/month=201902
Copying file: hdfs://mycluster/db_test.db/student5/month=201902/000000_0
Copying data from hdfs://mycluster/db_test.db/student5/month=201908
Copying file: hdfs://mycluster/db_test.db/student5/month=201908/000000_0
Copying data from hdfs://mycluster/db_test.db/student5/month=201909
Copying file: hdfs://mycluster/db_test.db/student5/month=201909/000000_0
OK
Time taken: 0.347 seconds

 

----------------------------------------------------------------------------------------------------------------

4.1、使用truncate清除表中的数据,分区表的话会清空分区的数据,但是分区还是会保留的

hive (db_test)> truncate table student5;
OK
Time taken: 0.578 seconds

以上是关于大数据--hiveDML命令操作的主要内容,如果未能解决你的问题,请参考以下文章

大数据笔记 | HDFS 常用操作命令

大数据之Zookeeper:客户端命令行操作

大数据之HDFS命令行基本操作

大数据之HDFS命令行基本操作

2021年大数据HBase:HBase的相关操作-客户端命令式!建议收藏

大数据 | 实验一:大数据系统基本实验 | 常用的 Linux 操作和 Hadoop 操作