大数据系列 (Hadoop) Hive 安装配置

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了大数据系列 (Hadoop) Hive 安装配置相关的知识,希望对你有一定的参考价值。

 注:安装hive之前,请先搭建hadoop环境。http://www.cnblogs.com/YellowstonePark/p/7750213.html

 

 1、安装Hive

 下载hive 

这里使用的是apache-hive-2.1.1-bin.tar.gz,下载地址:链接:http://pan.baidu.com/s/1jIKNw82 密码:fd0w

将下载的hive 上传到node01服务器的/opt目录下

[[email protected] ~]# cd /opt/
[[email protected] opt]# ll
total 727676
-rw-r--r--. 1 root root 149756462 Oct 23 12:24 apache-hive-2.1.1-bin.tar.gz
-rw-r--r--. 1 root root 266688029 Oct 23 12:24 hadoop-2.7.4.tar.gz
-rw-r--r--. 1 root root 104659474 Oct 23 12:24 hbase-1.2.6-bin.tar.gz
-rw-r--r--. 1 root root 185515842 Oct 23 12:26 jdk-8u144-linux-x64.tar.gz
-rw-r--r--. 1 root root   3465760 Oct 23 12:24 mysql-connector-java-5.1.44.tar.gz
-rw-r--r--  1 root root  35042811 Oct 23 13:37 zookeeper-3.4.10.tar.gz

切换路径到/usr/app目录下

[[email protected] app]# cd /usr/app/

解压hive

[[email protected] app]# tar zxvf /opt/apache-hive-2.1.1-bin.tar.gz

创建hive软连接

[[email protected] app]# ln -s apache-hive-2.1.1-bin/ hive

添加hive环境变量

[[email protected] app]# vi /etc/profile
export JAVA_HOME=/usr/app/java
export CLASS_PATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export ZOOKEEPER_HOME=/usr/app/zookeeper
export HADOOP_HOME=/usr/app/hadoop
export HIVE_HOME=/usr/app/hive

export PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin

使环境变量生效

[[email protected] app]# source /etc/profile

切换目录到conf下

[[email protected] app]# cd hive/conf/

创建hive-env.sh、hive-site.xml、hive-log4j2.properties、hive-exec-log4j2.properties

[[email protected] app]# cp hive-env.sh.template hive-env.sh
[[email protected] app]# cp hive-default.xml.template hive-site.xml
[[email protected] app]# cp hive-log4j2.properties.template hive-log4j2.properties
[[email protected] app]# cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties

编辑hive-env.sh,在文件开始添加以下变量

[[email protected] app]# vi hive-env.sh
export JAVA_HOME=/usr/app/java
export HADOOP_HOME=/usr/app/hadoop
export HIVE_HOME=/usr/app/hive
export HIVE_CONF_DIR=/usr/app/hive/conf

创建以下hdfs目录并更改权限

  [[email protected] conf]# hdfs dfs -mkdir -p /user/hive/warehouse
  [[email protected] conf]# hdfs dfs -mkdir -p /user/hive/tmp
  [[email protected] conf]# hdfs dfs -mkdir -p /user/hive/log
  [[email protected] conf]# hdfs dfs -chmod 777 /user/hive/warehouse
  [[email protected] conf]# hdfs dfs -chmod 777 /user/hive/tmp
  [[email protected] conf]# hdfs dfs -chmod 777 /user/hive/log

修改hive-site.xml 配置

  [[email protected] conf]# vi hive-site.xml
  <property>
    <name>hive.exec.scratchdir</name>
    <value>/user/hive/tmp</value>
    <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
  </property>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
  </property>
  <property>
    <name>hive.querylog.location</name>
    <value>/user/hive/log</value>
    <description>Location of Hive run time structured log file</description>
  </property>

 

注:Hive Metastore 默认情况下, Hive 的元数据保存在内嵌的 Derby 数据库里, 但一般情况下生产环境会使用 MySQL 来存放 Hive 元数据。

2、安装MySQL

[[email protected] opt]# yum install mysql
[[email protected] opt]# yum install mysql-server
[[email protected] opt]# yum install mysql-devel
[[email protected] opt]# service mysqld status
[[email protected] opt]# service mysqld start
[[email protected] opt]# service mysqld status
[[email protected] opt]# chkconfig mysqld on

创建数据库和用户

[[email protected] conf]# mysql -u root -p
mysql> CREATE DATABASE hive;
mysql> USE hive;
mysql> CREATE USER ‘hive‘@‘localhost‘ IDENTIFIED BY ‘hive‘;
mysql> GRANT ALL ON hive.* TO ‘hive‘@‘localhost‘ IDENTIFIED BY ‘hive‘;
mysql> GRANT ALL ON hive.* TO ‘hive‘@‘%‘ IDENTIFIED BY ‘hive‘;
mysql> FLUSH PRIVILEGES;
mysql> quit;

修改hive-site.xml 需要在 hive-site.xml 文件中配置 MySQL 数据库连接信息

  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
  </property>

在运行hive命令前,确保以下两点

1、HDFS 已经启动。可以使用 start-dfs.sh 脚本来启动 HDFS。

2、mysql-connector-java-5.1.39.jar 已经加到 $HIVE_HOME/lib 目录下。

从 Hive 2.1 版本开始, 我们需要先运行 schematool 命令来执行初始化操作。

[[email protected] conf]# schematool schematool -dbType mysql -initSchema

技术分享

启动hive

[[email protected] conf]# hive
which: no hbase in (/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/app/java/bin:/usr/app/zookeeper/bin:/usr/app/hadoop/bin:/usr/app/hadoop/sbin:/usr/app/hive/bin:/root/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/app/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/app/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in file:/usr/app/apache-hive-2.1.1-bin/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> 

 问题汇总

 hive 启动时,提示以下信息:

Sat May 21 10:25:12 CST 2016 WARN: Establishing SSL connection without server‘s identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL
connection must be established by default if explicit option isn‘t set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to ‘false‘. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.

 解决方法:在 hive-site.xml 配置文件中 javax.jdo.option.ConnectionURL 配置项的值上加上 useSSL=false。即最后的配置如下:

 <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>

 initSchema问题,提示以下信息:

Exception in thread "main" java.lang.RuntimeException: Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don‘t forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql)

解决方法:执行 schematool -initSchema -dbType mysql。即安装 Hive 后,在首次运行 hive 之前要执行以上命令。

 

相对路径问题,提示以下信息:

Exception in thread "main"java.lang.RuntimeException: java.lang.IllegalArgumentException:java.net.URISyntaxException: Relative path in absolute URI:${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
        atorg.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:444)
        atorg.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:672)
        atorg.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
        atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        atjava.lang.reflect.Method.invoke(Method.java:606)
        atorg.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.IllegalArgumentException:java.net.URISyntaxException: Relative path in absolute URI:${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
        atorg.apache.hadoop.fs.Path.initialize(Path.java:148)
        atorg.apache.hadoop.fs.Path.<init>(Path.java:126)
        atorg.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:487)
        atorg.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:430)
        ... 7more

解决方法:将 hive-site.xml 中的 ${system:java.io.tmpdir} 和 ${system:user.name} 分别替换成 /tmp 和 ${user.name} 

 

执行 schematool -dbType mysql -initSchema 时,提示以下信息:

The reference to entity "characterEncoding" must end with the ‘;‘ delimiter

解决办法:"&"在xml中这个是读不出的,正确写法将"&"转义"&amp;"

有时候在使用xml作为配置文件的时候,应该要使用xml的编码规则来进行适当的设置。

下面给出xml中一些特殊符号的编码转换:
&lt; < 小于号
&gt; > 大于号
&amp; & and字符
&apos; ‘ 单引号
&quot; " 双引号

 

 Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
	at org.apache.hadoop.fs.Path.initialize(Path.java:205)
	at org.apache.hadoop.fs.Path.<init>(Path.java:171)
	at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:644)
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:563)
	at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:531)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
	at java.net.URI.checkPath(URI.java:1823)
	at java.net.URI.<init>(URI.java:745)
	at org.apache.hadoop.fs.Path.initialize(Path.java:202)
	... 12 more

问题解决:将hive-site.xml配置文件件中的"system:java.io.tmpdir"全部替换为"/tmp" 

 

Exception in thread "main" java.lang.RuntimeException: The root scratch dir: /user/hive/tmp on HDFS should be writable. Current permissions are: rwxrwxr-x
	at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:705)
	at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:635)
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:563)
	at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:531)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

问题解决: hdfs dfs -chmod 777 /user/hive/tmp 将指定路径权限更改为777

 













以上是关于大数据系列 (Hadoop) Hive 安装配置的主要内容,如果未能解决你的问题,请参考以下文章

大数据学习系列之四 ----- Hadoop+Hive环境搭建图文详解(单机)

HIVE学习系列——windows Hadoop安装(上)

转:Hadoop大数据开发基础系列:七Hive基础

Hadoop详解——Hive的原理和安装配置和UDF,flume的安装和配置以及简单使用,flume+hive+Hadoop进行日志处理

Hadoop详解——Hive的原理和安装配置和UDF,flume的安装和配置以及简单使用,flume+hive+Hadoop进行日志处理

大数据学习系列之七 ----- Hadoop+Spark+Zookeeper+HBase+Hive集