大数据线上问题排查系列 - HIVE 踩坑记

Posted 明哥的IT随笔

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了大数据线上问题排查系列 - HIVE 踩坑记相关的知识,希望对你有一定的参考价值。

大数据线上问题排查系列 - HIVE 踩坑记

前言

大家好,我是明哥!

本片博文是“大数据线上问题排查系列”大类别之一,讲述前段时间我司某产品在某券商遇到的一个问题及解决方案,其背后涉及到 hive 的一个 BUG,在 hive 3.0 才修复。以下是正文。

问题现象

cdh6.2.1中,开启 kerberos 和 sentry 的hive 中,使用 dml 语句 insert overwrite 插入数据到分区表的新分区时,会报错。同时注意到:

  • 查看底层 hdfs 发现对应的目录和数据都有了;
  • 使用 show partitions xxx 查看不到对应的新分区;
  • 使用 select 语句也查不到该新分区下的数据;
  • 只要是插入数据到新的分区,不管使用静态分区还是动态分区模式,都会报错;

问题分析

为确定问题,首先尝试了不同场景,发现如下现象:

  • 当使用 insert overwrite 插入数据到分区表的已有分区时,不会报错;(不管该分区下有没有数据);
  • 使用 insert into 插入数据到分区表的新分区时,不会报错;
  • 对于非分区表,insert overwrite 和 insert into 都不会报错;
  • 当使用insert overwrite 插入数据到分区表的新分区报错后,可以使用 msck repair test0317 修复hive metastore中相关元数据,修复后 select, show partitions 等语句可以查询到分区和该分区的底层数据,没有问题;

为进一步分析问题,常看了相关日志,包括客户端日志,hiveserver2 日志和 hive metastore 日志分别如下所示。

客户端日志(beeline)

INFO  : Loading data to table apollo_ods_jzfix.test0317 partition (ptdate=1) from hdfs://dev-dw-nn01:8020/user/hive/warehouse/apollo_ods_jzfix.db/test0317/ptdate=1/.hive-staging_hive_2021-03-17_15-09-13_232_1543365768355672834-7333/-ext-10000
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing fire_listener_event
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-1: Map: 1   Cumulative CPU: 2.56 sec   HDFS Read: 5344 HDFS Write: 609 HDFS EC Read: 0 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 2 seconds 560 msec
INFO  : Completed executing command(queryId=hive_20210317150913_9d734c54-f0cf-4dc7-9117-bc7f59c2cb61); Time taken: 17.758 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing fire_listener_event (state=08S01,code=1)

hive metastore 日志

2021-03-17 15:09:30,039 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-9-thread-122]: ugi=hive/dev-dw-nn01@GYDW.COM	ip=10.2.91.100	cmd=get_table : db=apollo_ods_jzfix tbl=test0317	
2021-03-17 15:09:30,044 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-9-thread-122]: 121: get_partition : db=apollo_ods_jzfix tbl=test0317[1]
2021-03-17 15:09:30,044 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-9-thread-122]: ugi=hive/dev-dw-nn01@GYDW.COM	ip=10.2.91.100	cmd=get_partition : db=apollo_ods_jzfix tbl=test0317[1]	
2021-03-17 15:09:30,053 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-9-thread-122]: NoSuchObjectException(message:partition values=[1])
	at org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:2003)
	at sun.reflect.GeneratedMethodAccessor97.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
	at com.sun.proxy.$Proxy26.getPartition(Unknown Source)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition(HiveMetaStore.java:3553)
	at org.apache.hadoop.hive.metastore.events.InsertEvent.<init>(InsertEvent.java:62)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.fire_listener_event(HiveMetaStore.java:6737)
	at sun.reflect.GeneratedMethodAccessor100.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
	at com.sun.proxy.$Proxy28.fire_listener_event(Unknown Source)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14208)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14193)
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
	at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:594)
	at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:589)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
	at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:589)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

2021-03-17 15:09:30,054 ERROR org.apache.thrift.ProcessFunction: [pool-9-thread-122]: Internal error processing fire_listener_event
org.apache.hadoop.hive.metastore.api.NoSuchObjectException: partition values=[1]
	at org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:2003) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at sun.reflect.GeneratedMethodAccessor97.invoke(Unknown Source) ~[?:?]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
	at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at com.sun.proxy.$Proxy26.getPartition(Unknown Source) ~[?:?]
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition(HiveMetaStore.java:3553) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.metastore.events.InsertEvent.<init>(InsertEvent.java:62) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.fire_listener_event(HiveMetaStore.java:6737) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at sun.reflect.GeneratedMethodAccessor100.invoke(Unknown Source) ~[?:?]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at com.sun.proxy.$Proxy28.fire_listener_event(Unknown Source) ~[?:?]
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14208) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14193) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:594) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:589) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
	at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_181]
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) [hadoop-common-3.0.0-cdh6.2.1.jar:?]
	at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:589) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at java.lang.Thread.r

以上是关于大数据线上问题排查系列 - HIVE 踩坑记的主要内容,如果未能解决你的问题,请参考以下文章

大数据问题排查系列-大数据集群开启 kerberos 认证后 HIVE 作业执行失败

大数据问题排查系列 - TDH大数据平台中 HIVE作业长时间无法执行结束

小程序踩坑记之——花屏/重复渲染数据

大数据问题排查系列 - TDH大数据平台中HIVE作业长时间无法执行结束

大数据问题排查系列 - 因HIVE 中元数据与HDFS中实际的数据不一致引起的问题的修复

RabbitMQ踩坑记