通过 Kylin 构建多维数据集创建配置单元表时出错

Posted

技术标签:

【中文标题】通过 Kylin 构建多维数据集创建配置单元表时出错【英文标题】:Error while creating hive table through Kylin build cube 【发布时间】:2018-11-19 15:16:38 【问题描述】:

您好,我正在尝试使用 Kylin 构建一个多维数据集,数据从 sqoop 获得很好,但是创建配置单元表的下一步失败。看着被触发的命令看起来很奇怪,因为 create 语句对我来说看起来不错。

我认为问题在于 DOUBLE 类型,因为当我删除相同的 create 语句时,它可以正常工作。有人可以帮忙吗?

我在 AWS EMR 中使用堆栈,kylin 2.5 hive 2.3.0

错误日志如下所示

命令

    hive -e "USE default;
DROP TABLE IF EXISTS kylin_intermediate_fm_inv_holdings_8a1c33df_d12b_3609_13ee_39e169169368;
CREATE EXTERNAL TABLE IF NOT EXISTS kylin_intermediate_fm_inv_holdings_8a1c33df_d12b_3609_13ee_39e169169368
(
HOLDINGS_STOCK_INVESTOR_ID string
,STOCK_INVESTORS_CHANNEL string
,STOCK_STOCK_ID string
,STOCK_DOMICILE string
,STOCK_STOCK_NM string
,STOCK_APPROACH string
,STOCK_STOCK_TYP string
,INVESTOR_ID string
,INVESTOR_NM string
,INVESTOR_DOMICILE_CNTRY string
,CLIENT_NM string
,INVESTOR_HOLDINGS_GROSS_ASSETS_USD double(22)
,INVESTOR_HOLDINGS_NET_ASSETS_USD double(22)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 's3://wfg1tst-models/kylin/kylin_metadata/kylin-4ae3b18b-831b-da66-eb8c-7318245c4448/kylin_intermediate_fm_inv_holdings_8a1c33df_d12b_3609_13ee_39e169169368';
ALTER TABLE kylin_intermediate_fm_inv_holdings_8a1c33df_d12b_3609_13ee_39e169169368 SET TBLPROPERTIES('auto.purge'='true');

" --hiveconf hive.merge.mapredfiles=false --hiveconf hive.auto.convert.join=true --hiveconf dfs.replication=2 --hiveconf hive.exec.compress.output=true --hiveconf hive.auto.convert.join.noconditionaltask=true --hiveconf mapreduce.job.split.metainfo.maxsize=-1 --hiveconf hive.merge.mapfiles=false --hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 --hiveconf hive.stats.autogather=true

错误如下

OK
Time taken: 1.315 seconds
OK
Time taken: 0.09 seconds
MismatchedTokenException(334!=347)
    at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
    at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
    at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:6179)
    at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:3808)
    at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2382)
    at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1333)
    at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204)
    at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77)
    at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:468)
    at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1316)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1456)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1236)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1226)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 15:42 mismatched input '(' expecting ) near 'double' in create table statement

【问题讨论】:

【参考方案1】:

哦,我想我终于想通了,Hive 不支持精确的 DOUBLE 。但我认为 Kylin 在将 jdbc 元数据导入模型时应该注意这一点。

同样会在 Kylin 中引发增强或错误。

【讨论】:

【参考方案2】:

我在我的测试环境中安装了最新的稳定配置单元,版本是 2.3.4(配置单元 DDL 文档说 2.2.0 支持双精度)。我发现 hive 不支持具有特定/用户定义精度的双精度。

例如

[错误] 创建表哈哈(哈哈双10); [错误] 创建表哈哈(ha double Precision 17); [错误] 创建表 haha​​(ha double(22)); [CORRECT] 建表哈哈(哈哈双精度);

Hive CMD:

hive> use ss;
OK
Time taken: 0.015 seconds
hive> create table haha(ha double 10);
FAILED: ParseException line 1:28 extraneous input '10' expecting ) near '<EOF>'
hive> create table haha(ha double Precision 17);
FAILED: ParseException line 1:38 extraneous input '17' expecting ) near '<EOF>'
hive> create table haha(ha double(22));
MismatchedTokenException(334!=347)
at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:6179)
at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:3808)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2382)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1333)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:208)
at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77)
at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:468)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:27 mismatched input '(' expecting ) near 'double' in create table statement
hive> create table haha(ha double Precision);
OK
Time taken: 0.625 seconds

所以,据我所知,Hive 不支持这样的数据类型定义。 kylin dev 无法解决。

如果您发现任何错误,请告诉我。我发现以下链接可能会有所帮助:

    https://cwiki.apache.org/confluence/display/hive/LanguageManual+DDL#LanguageManualDDL-CreateTableCreate/Drop/TruncateTable

    https://issues.apache.org/jira/browse/HIVE-13556

【讨论】:

以上是关于通过 Kylin 构建多维数据集创建配置单元表时出错的主要内容,如果未能解决你的问题,请参考以下文章

无法使用 apache kylin 构建多维数据集

Kylin - 查询 Hbase 上的 OLAP 多维数据集?

使用 REST API 创建 Apache kylin 多维数据集

大数据量多维分析项目Kylin调研二期

麒麟立方体构建成功,但其大小显示为 0 KB

Kylin系列-使用Saiku+Kylin构建多维分析OLAP平台