大数据--hiveDDL命令操作
Posted jeff190812
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了大数据--hiveDDL命令操作相关的知识,希望对你有一定的参考价值。
1、在hive中显示当前数据库的名字和表头的信息,需要在$HIVE_HOME/conf目录下的hive-site.xml文件中添加如下内容:
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
配置完成后就可以看到如下的效果了:
hive (default)> select * from student;
OK
student.id student.name
1001 ss1
Time taken: 0.061 seconds, Fetched: 1 row(s)
======================================================================================================================================
2、修改hive的默认数据库也就是default的HDFS的存储路径,需要在$HIVE_HOME/conf目录下的hive-site.xml文件中添加如下内容(此段内容是在hive-default.xml.template文件下的):
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
--需要在配置下同组用户有写的权限(我的Hadoop的环境是HA模式的)
hdfs dfs -chmod g+w /user/hive/warehouse hdfs://bigdata111:9000/
--在HIVE_HOME/conf目录下的hive-site.xml文件中新增配置项对所有的连接到hive的session都生效
<property>
<name>mapred.reduce.tasks</name>
<value>3</value> --默认值是-1
</property>
--在Linuxshell上修改配置项修改后连接到hive上,该session没有退出hiveshell的时候是生效的,当退出该hiveshell的时候就会失效,值还是-1不是修改后的值
hive -hiveconf mapred.reduce.tasks=2;
--在hiveshell上修改配置项修改后,只对当前连接到hiveshell修改配置项的session生效,退出后就实效了
hive (default)> set mapred.reduce.tasks=2;
=====================================================================================================================================
3、hive的常用命令:
3.1、在hiveshell下执行hive的命令:
--查询hive下所有的数据库
hive (default)> show databases;
OK
database_name
db_test
default
Time taken: 0.096 seconds, Fetched: 2 row(s)
------------------------------------------------------------
--查询当前default数据库下的所有表
hive (default)> show tables;
OK
tab_name
data
student
sum
Time taken: 0.018 seconds, Fetched: 3 row(s)
-----------------------------------------------------------
--查询表data的表结构信息
hive (default)> desc data;
OK
col_name data_type comment
id int
name string
Time taken: 0.06 seconds, Fetched: 2 row(s)
-----------------------------------------------------------
--查询表data的数据
hive (default)> select * from data;
OK
data.id data.name
1001 zhangshan
1002 lishi
1003 zhaoliu
Time taken: 0.066 seconds, Fetched: 3 row(s)
---------------------------------------------------------------
--切换数据库
hive (default)> use db_test;
OK
Time taken: 0.013 seconds
----------------------------------------------------------------
--创建内部表,hive创建的内部表,可以将数据移动到指定的HDFS的目录下,而且删除内部表的时候会将内部表的元数据和HDFS上存储的数据都删除
hive (db_test)> create table if not exists student(id int,name string)
> row format delimited fields terminated by ‘\t‘
> stored as textfile
> location ‘/db_test.db/student‘;
OK
Time taken: 0.058 seconds
-------------------------------------------------------------------
[root@bigdata113 hivetest]# cat data
1001 zhangshan
1002 lishi
1003 zhaoliu
--将/root/hivetest目录下文件data的内容导入刚刚创建的内部表student
hive (db_test)> load data local inpath ‘/root/hivetest/data‘ into table student;
Loading data to table db_test.student
Table db_test.student stats: [numFiles=1, totalSize=39]
OK
Time taken: 0.197 seconds
----------------------------------------------------------------------------------
--查询下刚刚导入文件的内部表student
hive (db_test)> select * from student;
OK
student.id student.name
1001 zhangshan
1002 lishi
1003 zhaoliu
Time taken: 0.073 seconds, Fetched: 3 row(s)
--在HDFS上查询下该表对应的导入的文件data
hive (db_test)> dfs -lsr /db_test.db hdfs://bigdata111:9000/;
lsr: DEPRECATED: Please use ‘ls -R‘ instead.
drwxrwxr-x - root supergroup 0 2019-09-11 15:08 /db_test.db/student
-rwxrwxr-x 3 root supergroup 39 2019-09-11 15:08 /db_test.db/student/data
drwxrwxr-x - root supergroup 0 2019-09-11 15:12 hdfs://bigdata111:9000/db_test.db
drwxrwxr-x - root supergroup 0 2019-09-11 15:08 hdfs://bigdata111:9000/db_test.db/student
-rwxrwxr-x 3 root supergroup 39 2019-09-11 15:08 hdfs://bigdata111:9000/db_test.db/student/data
--查询下data的内容
hive (db_test)> dfs -cat /db_test.db/student/data hdfs://bigdata111:9000/;
1001 zhangshan
1002 lishi
1003 zhaoliu
---------------------------------------------------------------------------------------
--创建hive的外部表,使用关键字external,删除外部表的时候只是会删除元数据也就是hive的table删除,但是在HDFS上存储的文件不会被删除
hive (db_test)> create external table if not exists studentout(id int,name string)
> row format delimited fields terminated by ‘\t‘;
OK
Time taken: 0.054 seconds
-----------------------------------------------------------------------------------
[root@bigdata113 hivetest]# cat dataout
1001 zhangshan
1002 lishi
1003 zhaoliu
--将/root/hivetest目录下的文件dataout导入到刚刚创建的外部表
hive (db_test)> load data local inpath ‘/root/hivetest/dataout‘ into table studentout;
Loading data to table db_test.studentout
Table db_test.studentout stats: [numFiles=1, totalSize=39]
OK
Time taken: 0.204 seconds
-----------------------------------------------------------------------------------
--查询下刚刚创建和导入数据的外部表studentout
hive (db_test)> select * from studentout;
OK
studentout.id studentout.name
1001 zhangshan
1002 lishi
1003 zhaoliu
Time taken: 0.044 seconds, Fetched: 3 row(s)
-----------------------------------------------------------------------------------
--在HDFS上看看外部表导入的数据文件dataout
hive (db_test)> dfs -cat /db_test.db/studentout/dataout hdfs://bigdata111:9000/;
1001 zhangshan
1002 lishi
1003 zhaoliu
--------------------------------------------------------------------------------
--删除hive的外部表和内部表,外部表删除只会删除表的元数据不会删除HDFS上的文件,删除内部表就会删除表的元数据和HDFS上的文件
--删除内部表
hive (db_test)> drop table student;
OK
Time taken: 0.209 seconds
--删除外部表
hive (db_test)> drop table studentout;
OK
Time taken: 0.065 seconds
--查询下hive上刚刚删除的两张表的元数据
hive (db_test)> show tables;
OK
tab_name
Time taken: 0.012 seconds
--两张表的元数据已经被删除了,在看看HDFS上的数据
hive (db_test)> dfs -lsr /db_test.db/ hdfs://bigdata111:9000/;
lsr: DEPRECATED: Please use ‘ls -R‘ instead.
drwxrwxr-x - root supergroup 0 2019-09-11 15:18 /db_test.db/studentout
-rwxrwxr-x 3 root supergroup 39 2019-09-11 15:18 /db_test.db/studentout/dataout
drwxrwxr-x - root supergroup 0 2019-09-11 15:22 hdfs://bigdata111:9000/db_test.db
drwxrwxr-x - root supergroup 0 2019-09-11 15:18 hdfs://bigdata111:9000/db_test.db/studentout
-rwxrwxr-x 3 root supergroup 39 2019-09-11 15:18 hdfs://bigdata111:9000/db_test.db/studentout/dataout
--内部表的文件data已经被删除了,但是外部表的文件dataout还在HDFS上
----------------------------------------------------------------------------------------------------------------------
--删除数据库
hive (default)> drop database db_test cascade;
OK
Time taken: 0.035 seconds
hive (default)> show databases;
OK
database_name
default
Time taken: 0.006 seconds, Fetched: 1 row(s)
-----------------------------------------------------------------------------------------------------------------
--创建数据库
hive (default)> create database if not exists db_test location ‘/db_test.db‘;
OK
Time taken: 0.03 seconds
-------------------------------------------------------------------------------------------------
--修改数据库的信息,数据库的信息除了数据库的名字和数据库的存储路径不能修改,可以修改数据库的创建时间
hive (default)> alter database db_test set dbproperties(‘createtime‘=‘20190911‘);
OK
Time taken: 0.024 seconds
-------------------------------------------------------------------------------------------
--查询数据库的详细信息
hive (default)> desc database extended db_test;
OK
db_name comment location owner_name owner_type parameters
db_test hdfs://mycluster/db_test.db root USER createtime=20190911
Time taken: 0.019 seconds, Fetched: 1 row(s)
--------------------------------------------------------------------------------------------
3.2、在Linuxshell命令下执行hive的命令
--查询下hive的命令帮助
[root@bigdata113 hivetest]# hive -help
usage: hive
-d,--define <key=value> Variable subsitution to apply to hive
commands. e.g. -d A=B or --define A=B
--database <databasename> Specify the database to use
-e <quoted-query-string> SQL from command line
-f <filename> SQL from files
-H,--help Print help information
--hiveconf <property=value> Use value for given property
--hivevar <key=value> Variable subsitution to apply to hive
commands. e.g. --hivevar A=B
-i <filename> Initialization SQL file
-S,--silent Silent mode in interactive shell
-v,--verbose Verbose mode (echo executed SQL to the
console)
-------------------------------------------------------------------------
--在Linuxshell下查询数据库db_test下的表student的数据
[root@bigdata113 hivetest]# hive --database db_test -e ‘select * from student;‘
Logging initialized using configuration in file:/opt/module/apache-hive-1.2.1-bin/conf/hive-log4j.properties
OK
Time taken: 0.675 seconds
OK
student.id student.name
1001 zhangshan
1002 lishi
1003 zhaoliu
Time taken: 1.0 seconds, Fetched: 3 row(s)
---------------------------------------------------------------------------
--使用sql脚本来执行hive命令
[root@bigdata113 hivetest]# cat student.sql
select * from student;
--使用student.sql文件来查询数据库db_test下的表student的数据
[root@bigdata113 hivetest]# hive --database db_test -f student.sql
Logging initialized using configuration in file:/opt/module/apache-hive-1.2.1-bin/conf/hive-log4j.properties
OK
Time taken: 0.714 seconds
OK
student.id student.name
1001 zhangshan
1002 lishi
1003 zhaoliu
Time taken: 0.679 seconds, Fetched: 3 row(s)
---------------------------------------------------------------------------------------
--使用“!”在hiveshell下可以使用Linux的命令
hive (db_test)> !cat /root/hivetest/data;
1001 zhangshan
1002 lishi
1003 zhaoliu
hive (db_test)> !ls /root;
anaconda-ks.cfg
hivetest
rpm
zookeeper.out
------------------------------------------------------------------------------------
--使用hive的map、array、struct来创建表和导入相关的数据
--创建的表和导入的数据需要显示成如下的格式
"name": "songsong",
"friends": ["bingbing" , "lili"] , //列表Array,
"children": //键值Map,
"xiao song": 18 ,
"xiaoxiao song": 17
"address": //结构Struct,
"street": "hai dian qu" ,
"city": "beijing"
--先创建表
hive (db_test)> create table if not exists structsum(name string,
> friends array<string>,
> children map<string,int>,
> address struct<street:string,city:string>)
> row format delimited fields terminated by ‘,‘
> collection items terminated by ‘_‘
> map keys terminated by ‘:‘
> lines terminated by ‘\n‘;
OK
Time taken: 0.109 seconds
--MAP STRUCT 和 ARRAY 的分隔符(数据分割符号)用的是"_"这个符号
-------------------------------------------------------------------------------------------
--将/root/hivetest目录下的文件sum的数据导入到刚刚创建的表structsum里
[root@bigdata113 hivetest]# cat sum
songsong,bingbing_lili,xiao song:18_xiaoxiao song:17,hai dian qu_beijing
yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing
hive (db_test)> load data local inpath ‘/root/hivetest/sum‘ into table structsum;
Loading data to table db_test.structsum
Table db_test.structsum stats: [numFiles=1, totalSize=142]
OK
Time taken: 0.151 seconds
--查询下表structsum的数据
hive (db_test)> select * from structsum;
OK
structsum.name structsum.friends structsum.children structsum.address
songsong ["bingbing","lili"] "xiao song":18,"xiaoxiao song":17 "street":"hai dian qu","city":"beijing"
yangyang ["caicai","susu"] "xiao yang":18,"xiaoxiao yang":19 "street":"chao yang","city":"beijing"
Time taken: 0.088 seconds, Fetched: 2 row(s)
--查询name是songsong的第一个朋友也就是bingbing及children里面的xiaoxiao song的年龄的信息及街道和城市信息,给字段去中文别名的话需要加特殊符号`,这个不是单引号是键盘Esc键上面的`号
hive (db_test)> select friends[0]as`朋友`,children[‘xiaoxiao song‘]as`年龄`,address.street,address.city from structsum where name="songsong";
OK
朋友 年龄 street city
bingbing 17 hai dian qu beijing
Time taken: 0.097 seconds, Fetched: 1 row(s)
========================================================================================================================================
以上是关于大数据--hiveDDL命令操作的主要内容,如果未能解决你的问题,请参考以下文章