Hive:Decimal(12,9) 的列类型使用 JSONSerDe 抛出 NullPointerException
Posted
技术标签:
【中文标题】Hive:Decimal(12,9) 的列类型使用 JSONSerDe 抛出 NullPointerException【英文标题】:Hive: column type of Decimal(12,9) throws NullPointerException with JSONSerDe 【发布时间】:2018-07-11 11:03:58 【问题描述】:我有一个指向 json 数据的外部表。我正在使用 serde org.apache.hive.hcatalog.data.JsonSerDe
.
我已经使用 DDL 在这个外部表上创建了一个视图:
CREATE VIEW `my_table` AS SELECT
a.col1,
a.col2,
...
...
a.longitude,
a.latitude
FROM
(SELECT
mytable.body.col1,
mytable.body.col2,
....
..
mytable.body.longitude,
mytable.body.latidute,
ROW_NUMBER() OVER( PARTITION BY mytable.body.col1, mytable.body.col1 ORDER BY mytable.body.col3 DESC )
AS rownum FROM mydb.myExtTable) AS a where a.rownum=1
当我在做SELECT * FROM mytable
时,它给了我一个NullPointerException
:
Vertex failed, vertexName=Reducer 2, vertexId=vertex_1529530522022_75616_22_01, diagnostics=[Task failed, taskId=task_1529530522022_75616_22_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) [Error getting row data with exception java.lang.NullPointerException
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryHiveDecimal.init(LazyBinaryHiveDecimal.java:47)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:267)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:204)
at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:354)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:354)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:354)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:198)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:184)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:347)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:266)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
]
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
我只有 2 条 JSON 记录。 这两个 JSON 是这样的:
"header": "header1":"value1", "header2": "value2", "body": "col1": "col1 value", "col2": "col2 value",.... "latitude": 39.921302, "longitude": -74.879928
"header": "header1":"value1", "header2": "value2", "body": "col1": "col1 value", "col2": "col2 value",.... "latitude": 43658734.438, "longitude": 3453.3453
奇怪的是,当我在我的 VIEW 上运行 SELECT
时,它只使用 1 条记录它正确地获取了我,但是同时为两条记录运行它,它给了我异常。
当我从 JSON 数据(来自第二条记录)中删除 "latitude": 43658734.438, "longitude": 3453.3453
值时,一切又运行良好。longitude
和 latitude
列的类型为 decimal(12,9)
。
我怀疑,列值有什么问题吗?
但是,如果这些值在同时运行两条记录时出现问题,为什么它们单独运行良好(注意:当单独运行第二条记录时,该记录的这两个列值将被 NULL
替换)。
可能是什么问题 ?
请帮忙。
【问题讨论】:
【参考方案1】:查看定义https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_decimal.html
decimal(12,9)
表示 12 位,小数点后 9 位,所以小数点前 3 位。看起来你至少需要decimal(14,6)
这里
【讨论】:
谢谢哈罗德。那很有帮助。对于"latitude": 43658734.438
,我需要将其设为十进制(18,9)以上是关于Hive:Decimal(12,9) 的列类型使用 JSONSerDe 抛出 NullPointerException的主要内容,如果未能解决你的问题,请参考以下文章
T-SQL中向表中插入一条数据,其中类型为Decimal的列对应的插入值为‘2400’可就是不行还报“......
如何更改 datax 以支持hive 的 DECIMAL 数据类型?