什么是Hive
Hive是一个基于HDFS的查询引擎。我们日常中的需求如果都自己去写MapReduce来实现的话会很费劲的,Hive把日常用到的MapReduce功能,比如排序、分组等功能进行了抽象,对外提供类似于普通数据库的查询服务。
它只是封装MapReduce计算,但它本质并不是数据库服务,不适合作为联机服务。通常用于数据仓库的离线计算中。
在Hive中已经明确说明,不建议使用MapReduce了,而推荐使用Spark。
安装
tar -zxvf apache-hive-1.2.2-bin.tar.gz -C /usr/local/
hive-site.xml:
<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost/hive_metastore?createDatabaseIfNotExist=true</value> <description>metadata is stored in a MySQL server</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>MySQL JDBC driver class</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <description>user name for connecting to mysql server</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>root</value> <description>password for connecting to mysql server</description> </property> </configuration>
拷贝MySQL驱动jar到hive的lib目录中。
先启动HDFS:
start-dfs.sh
再启动hive
如果报错:
Logging initialized using configuration in jar:file:/usr/local/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties [ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at jline.TerminalFactory.create(TerminalFactory.java:101) at jline.TerminalFactory.get(TerminalFactory.java:158) at jline.console.ConsoleReader.<init>(ConsoleReader.java:229) at jline.console.ConsoleReader.<init>(ConsoleReader.java:221) at jline.console.ConsoleReader.<init>(ConsoleReader.java:209) at org.apache.hadoop.hive.cli.CliDriver.setupConsoleReader(CliDriver.java:787) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:721) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Exception in thread "main" java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at jline.console.ConsoleReader.<init>(ConsoleReader.java:230) at jline.console.ConsoleReader.<init>(ConsoleReader.java:221) at jline.console.ConsoleReader.<init>(ConsoleReader.java:209) at org.apache.hadoop.hive.cli.CliDriver.setupConsoleReader(CliDriver.java:787) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:721) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)需要把hive下lib目录中的jline jar文件替换到hadoop中yarn目录中:
rm /usr/local/hadoop-2.6.5/share/hadoop/yarn/lib/jline-0.9.94.jar cp /usr/local/apache-hive-1.2.2-bin/lib/jline-2.12.jar /usr/local/hadoop-2.6.5/share/hadoop/yarn/lib/
启动之后会自动创建数据库,登录数据库中可以查看到一些元信息:
mysql> use hive_metastore; mysql> use hive_metastore ; mysql> select * from dbs; +-------+-----------------------+------------------------------------------+---------+------------+------------+ | DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE | +-------+-----------------------+------------------------------------------+---------+------------+------------+ | 1 | Default Hive database | hdfs://centos01:9000/user/hive/warehouse | default | public | ROLE | +-------+-----------------------+------------------------------------------+---------+------------+------------+ 1 row in set (0.00 sec)
Hive基本操作
DDL
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...)
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
参考:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
Thrift
启动hiveserver2服务
./hiveserver2
使用beeline连接到这个服务上:
[[email protected] bin]# ./beeline Beeline version 1.2.2 by Apache Hive beeline> !connect jdbc:hive2://localhost:10000 Connecting to jdbc:hive2://localhost:10000 Enter username for jdbc:hive2://localhost:10000: root Enter password for jdbc:hive2://localhost:10000: **** Connected to: Apache Hive (version 1.2.2) Driver: Hive JDBC (version 1.2.2) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://localhost:10000> show databases; +----------------+--+ | database_name | +----------------+--+ | default | +----------------+--+ 1 row selected (3.666 seconds) 0: jdbc:hive2://localhost:10000>