[Hive]那些年我们踩过的Hive坑

Posted @SmartSi

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了[Hive]那些年我们踩过的Hive坑相关的知识,希望对你有一定的参考价值。

1. 缺少mysql驱动包

1.1 问题描述

Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
	at org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:58)
	at org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:213)

1.2. 解决方案

上述问题很可能是缺少mysql的jar包,下载mysql-connector-java-5.1.32.tar.gz,复制到hive的lib目录下:

xiaosi@yoona:~$ cp mysql-connector-java-5.1.34-bin.jar opt/hive-2.1.0/lib/

2. 元数据库mysql初始化

2.1 问题描述

运行./hive脚本时,无法进入,报错:

Exception in thread "main" java.lang.RuntimeException: Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql)

2.2 解决方案

在scripts目录下运行 schematool -initSchema -dbType mysql命令进行Hive元数据库的初始化:

xiaosi@yoona:~/opt/hive-2.1.0/scripts$  schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/xiaosi/opt/hive-2.1.0/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/xiaosi/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:	 jdbc:mysql://localhost:3306/hive_meta?createDatabaseIfNotExist=true
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 root
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.mysql.sql
Initialization script completed
schemaTool completed

3. Relative path in absolute URI

3.1 问题描述

Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: $system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
...
Caused by: java.net.URISyntaxException: Relative path in absolute URI: $system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
	at java.net.URI.checkPath(URI.java:1823)
	at java.net.URI.<init>(URI.java:745)
	at org.apache.hadoop.fs.Path.initialize(Path.java:202)
	... 12 more

3.2 解决方案

产生上述问题的原因是使用了没有配置的变量,解决此问题只需在配置文件hive-site.xml中配置system:user.name和system:java.io.tmpdir两个变量,配置文件中就可以使用这两个变量:

<property>
    <name>system:user.name</name>
    <value>xiaosi</value>
</property>
<property>
    <name>system:java.io.tmpdir</name>
    <value>/home/$system:user.name/tmp/hive/</value>
</property>

4. 拒绝连接

4.1 问题描述

on exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
...
Caused by: java.net.ConnectException: Call From Qunar/127.0.0.1 to localhost:9000 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
...
Caused by: java.net.ConnectException: 拒绝连接
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
	at org.apache.hadoop.ipc.Client$Connection.setupiostreams(Client.java:712)
	at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
	at org.apache.hadoop.ipc.Client.call(Client.java:1451)
	... 29 more

4.2 解决方案

有可能是Hadoop没有启动,使用jps查看一下当前进程发现:

xiaosi@yoona:~/opt/hive-2.1.0$ jps
7317 Jps

可以看见,我们确实没有启动Hadoop。开启Hadoop的NameNode和DataNode守护进程

xiaosi@yoona:~/opt/hadoop-2.7.3$ ./sbin/start-dfs.sh 
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/xiaosi/opt/hadoop-2.7.3/logs/hadoop-xiaosi-namenode-yoona.out
localhost: starting datanode, logging to /home/xiaosi/opt/hadoop-2.7.3/logs/hadoop-xiaosi-datanode-yoona.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/xiaosi/opt/hadoop-2.7.3/logs/hadoop-xiaosi-secondarynamenode-yoona.out
xiaosi@yoona:~/opt/hadoop-2.7.3$ jps
8055 Jps
7561 NameNode
7929 SecondaryNameNode
7724 DataNode

5. 创建Hive表失败

5.1 问题描述

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)

5.2 解决方案

查看Hive日志,看到这样的错误日志:

NestedThrowablesStackTrace:
Could not create "increment"/"table" value-generation container `SEQUENCE_TABLE` since autoCreate flags do not allow it. 
org.datanucleus.exceptions.NucleusUserException: Could not create "increment"/"table" value-generation container `SEQUENCE_TABLE` since autoCreate flags do not allow it.

出现上述问题主要因为mysql的bin-log format默认为statement ,在mysql中通过 show variables like 'binlog_format'; 语句查看bin-log format的配置值

mysql> show variables like 'binlog_format';
+---------------+-----------+
| Variable_name | Value     |
+---------------+-----------+
| binlog_format | STATEMENT |
+---------------+-----------+
1 row in set (0.00 sec)

修改bin-log format的默认值,在mysql的配置文件/etc/mysql/mysql.conf.d/mysqld.cnf中添加 binlog_format="MIXED" ,重启mysql,再启动 hive即可。

mysql> show variables like 'binlog_format';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | MIXED |
+---------------+-------+
1 row in set (0.00 sec)

再次执行创表语句:

hive> create table  if not exists employees(
    >    name string comment '姓名',
    >    salary float comment '工资',
    >    subordinates array<string> comment '下属',
    >    deductions map<string,float> comment '扣除金额',
    >    address struct<city:string,province:string> comment '家庭住址'
    > )
    > comment '员工信息表'
    > ROW FORMAT DELIMITED 
    > FIELDS TERMINATED BY '\\t'
    > LINES TERMINATED BY  '\\n'
    > STORED AS TEXTFILE;
OK
Time taken: 0.664 seconds

6. 加载数据失败

6.1 问题描述

hive> load data local inpath '/home/xiaosi/hive/input/result.txt' overwrite into table recent_attention;
Loading data to table test_db.recent_attention
Failed with exception Unable to move source file:/home/xiaosi/hive/input/result.txt to destination hdfs://localhost:9000/user/hive/warehouse/test_db.db/recent_attention/result.txt
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

查看Hive日志,看到这样的错误日志:

Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /home/xiaosi/hive/warehouse/recent_attention/result.txt could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.

看到 0 datanodes running 我们猜想可能datanode挂掉了,jps验证一下,果然我们的datanode没有启动起来。

那些年我们踩过的 Flink 坑系列

那些年我们踩过的 Flink 坑系列

那些年我们踩过的 Flink 坑系列

那些年,我们一起踩过的 “Android 坑”

那些年我们一起踩过的Spring Cloud Gateway获取body的那些坑

那些年曾经踩过的坑缓存雪崩