Hive多分隔符支持示例

Posted 2022-09-27

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Hive多分隔符支持示例相关的知识，希望对你有一定的参考价值。

温馨提示：要看高清无码套图，请使用手机打开并单击图片放大查看。

1.问题描述

如何将多个字符作为字段分割符的数据文件加载到Hive表中，事例数据如下：

字段分隔符为“@#$”

test1@#$test1name@#$test2value test2@#$test2name@#$test2value test3@#$test3name@#$test4value

如何将上述事例数据加载到Hive表(multi_delimiter_test)中，表结构如下：

字段名	字段类型
s1	String
s2	String
s3	String

2.Hive多分隔符支持

Hive在0.14及以后版本支持字段的多分隔符，参考https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe

3.实现方式

测试环境说明

测试环境为CDH5.11.1 Hive版本为1.1.0 操作系统为RedHat6.5

操作步骤

1.准备多分隔符文件并装载到HDFS对应目录

[ec2-user@ip-172-31-8-141 ~]$ cat multi_delimiter_test.dat test1@#$test1name@#$test2value test2@#$test2name@#$test2value test3@#$test3name@#$test4value [ec2-user@ip-172-31-8-141 ~]$ hadoop dfs -put multi_delimiter_test.dat /fayson/multi_delimiter_test [ec2-user@ip-172-31-8-141 ~]$ hadoop dfs -ls /fayson/multi_delimiter_test DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Found 1 items -rw-r--r-- 3 user_r supergroup 93 2017-08-23 03:24 /fayson/multi_delimiter_test/multi_delimiter_test.dat [ec2-user@ip-172-31-8-141 ~]$

2.基于准备好的多分隔符文件建表

create external table multi_delimiter_test( s1 string, s2 string, s3 string) ROW FORMAT SERDE org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe WITH SERDEPROPERTIES ("field.delim"="@#$") stored as textfile location /fayson/multi_delimiter_test;

Hive多分隔符支持示例_java

3.测试

2: jdbc:hive2://localhost:10000/default> select * from multi_delimiter_test; +--------------------------+--------------------------+--------------------------+--+ \| multi_delimiter_test.s1 \| multi_delimiter_test.s2 \| multi_delimiter_test.s3 \| +--------------------------+--------------------------+--------------------------+--+ \| test1 \| test1name \| test2value \| \| test2 \| test2name \| test2value \| \| test3 \| test3name \| test4value \| +--------------------------+--------------------------+--------------------------+--+

Hive多分隔符支持示例_hive_02

2: jdbc:hive2://localhost:10000/default> select count(*) from multi_delimiter_test; INFO : Ended Job = job_1503469952834_0006 INFO : MapReduce Jobs Launched: INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.25 sec HDFS Read: 6755 HDFS Write: 2 SUCCESS INFO : Total MapReduce CPU Time Spent: 3 seconds 250 msec INFO : Completed executing command(queryId=hive_20170823041818_ce58aae2-e6db-4eed-b6af-652235a6e66a); Time taken: 33.286 seconds INFO : OK +------+--+ \| _c0 \| +------+--+ \| 3 \| +------+--+ 1 row selected (33.679 seconds) 2: jdbc:hive2://localhost:10000/def

Hive多分隔符支持示例_hive_03

4.常见问题

1.执行count查询时报错

异常日志

通过beeline执行count查询时报错

2: jdbc:hive2://localhost:10000/default> select count() from multi_delimiter_test; INFO : Compiling command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97): select count() from multi_delimiter_test INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97); Time taken: 0.291 seconds INFO : Executing command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97): select count(*) from multi_delimiter_test INFO : Query ID = hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Number of reduce tasks determined at compile time: 1 INFO : In order to change the average load for a reducer (in bytes): INFO : set hive.exec.reducers.bytes.per.reducer=<number> INFO : In order to limit the maximum number of reducers: INFO : set hive.exec.reducers.max=<number> INFO : In order to set a constant number of reducers: INFO : set mapreduce.job.reduces=<number> INFO : number of splits:1 INFO : Submitting tokens for job: job_1503469952834_0002 INFO : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (token for hive: HDFS_DELEGATION_TOKEN owner=hive/ip-172-31-8-141.ap-southeast-1.compute.internal@CLOUDERA.COM, renewer=yarn, realUser=, issueDate=1503475160778, maxDate=1504079960778, sequenceNumber=27, masterKeyId=9) INFO : The url to track the job: http://ip-172-31-9-186.ap-southeast-1.compute.internal:8088/proxy/application_1503469952834_0002/ INFO : Starting Job = job_1503469952834_0002, Tracking URL = http://ip-172-31-9-186.ap-southeast-1.compute.internal:8088/proxy/application_1503469952834_0002/ INFO : Kill Command = /opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5/lib/hadoop/bin/hadoop job -kill job_1503469952834_0002 INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 INFO : 2017-08-23 03:59:32,039 Stage-1 map = 0%, reduce = 0% INFO : 2017-08-23 04:00:08,106 Stage-1 map = 100%, reduce = 100% ERROR : Ended Job = job_1503469952834_0002 with errors ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask INFO : MapReduce Jobs Launched: INFO : Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL INFO : Total MapReduce CPU Time Spent: 0 msec INFO : Completed executing command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97); Time taken: 48.737 seconds Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)

使用Hive的shell操作报错如下

Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:147) ... 22 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:323) at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:333) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:116) ... 22 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105) at org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:140) at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:297) ... 24 more FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 ms

问题原因分析

org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe类是hive-contrib.jar包里。

在执行非聚合类操作查询时，sql能正常执行，在进行聚合类函数操作时报错，说明在执行MapReduce任务时缺少jar依赖包；MapReduce属于yarn作业，所以yarn运行环境缺少hive-contrib.jar的依赖包。

解决方法

在CDH集群的所有节点一下操作，将hive-contrib-1.1.0-cdh5.10.2.jar包拷贝到yarn的lib目录下

sudo scp -r /opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib-1.1.0-cdh5.10.2.jar /opt/cloudera/parcels/CDH/lib/hadoop-yarn/lib/

重新运行count语句，执行成功

Hive多分隔符支持示例_hive_04

醉酒鞭名马，少年多浮夸！岭南浣溪沙，呕吐酒肆下！挚友不肯放，数据玩的花！
温馨提示：要看高清无码套图，请使用手机打开并单击图片放大查看。

以上是关于Hive多分隔符支持示例的主要内容，如果未能解决你的问题，请参考以下文章