线程“主”org.apache.spark.sql.AnalysisException 中的异常:由于数据类型不匹配,无法解析“named_struct()”:
Posted
技术标签:
【中文标题】线程“主”org.apache.spark.sql.AnalysisException 中的异常:由于数据类型不匹配,无法解析“named_struct()”:【英文标题】:Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'named_struct()' due to data type mismatch: 【发布时间】:2018-07-30 11:20:02 【问题描述】:我运行了 spark 应用程序,我在其中加入了两个数据集并形成了一个数据集,并使用编码器将 Dataset<Row>
转换为 Dataset<T
> 格式。
编码器如下:
Encoder<RuleParamsBean> encoder = Encoders.bean(RuleParamsBean.class);
Dataset<RuleParamsBean> ds = new Dataset<RuleParamsBean>(sparkSession, finalJoined.logicalPlan(), encoder);
Dataset<RuleParamsBean> validateDataset = ds.map(rulesParamBean -> validateTransaction(rulesParamBean),encoder);
validateDataset.show();
在对数据集进行地图操作后,我收到如下错误:
Dataset<RuleParamsBean> ds = new Dataset<RuleParamsBean>(sparkSession, finalJoined.logicalPlan(), encoder);
错误日志
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'named_struct()' due to data type mismatch: input to function named_struct requires at least one argument;;
Relation[TXN_DETAIL_ID#0,TXN_HEADER_ID#1,TXN_SOURCE_CD#2,TXN_REC_TYPE_CD#3,TXN_DTTM#4,EXT_TXN_NBR#5,CUST_REF_NBR#6,CIS_DIVISION#7,ACCT_ID#8,TXN_VOL#9,TXN_AMT#10,CURRENCY_CD#11,MANUAL_SW#12,USER_ID#13,HOW_TO_USE_TXN_FLG#14,MESSAGE_CAT_NBR#15,MESSAGE_NBR#16,UDF_CHAR_1#17,UDF_CHAR_2#18,UDF_CHAR_3#19,UDF_CHAR_4#20,UDF_CHAR_5#21,UDF_CHAR_6#22,UDF_CHAR_7#23,... 102 more fields] JDBCRelation(CI_TXN_DETAIL_STG_DUMMY) [numPartitions=1]
Relation[ACCT_ID#377,ACCT_NBR_TYPE_CD#378,ACCT_NBR#379,VERSION#380,PRIM_SW#381] JDBCRelation(CI_ACCT_NBR_DUMMY) [numPartitions=1]
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106)
at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:116)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:120)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:120)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:125)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:125)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:95)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:85)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:104)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:172)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:178)
at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65)
at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:3300)
at org.apache.spark.sql.Dataset.map(Dataset.scala:2569)
at com.sample.Transformation.main(Transformation.java:100)
【问题讨论】:
我也有同样的问题。当我从 Spark 2.2.1 切换到 Spark 2.3.0 时,问题开始出现,但我还没有找到任何解决方案。 就我而言,问题是我在数据集中使用的 JavaBean 有一个 Object 类型的字段,该字段有时为空。我已经删除了那个变量,现在它可以工作了。 【参考方案1】:对我来说,问题是由于类型不受支持。我猜我使用的是 Spark 2.X 不支持的 LocalDate。(我认为他们在版本 3 中包含了对它的支持)
我只是将它从 LocalDate 更改为 TimeStamp 并且它起作用了。看看你是否也是这种情况?您的 POJO 中不支持的任何类型?
【讨论】:
以上是关于线程“主”org.apache.spark.sql.AnalysisException 中的异常:由于数据类型不匹配,无法解析“named_struct()”:的主要内容,如果未能解决你的问题,请参考以下文章
类型不匹配;找到:org.apache.spark.sql.DataFrame 需要:org.apache.spark.rdd.RDD
Spark SQL 查询:org.apache.spark.sql.AnalysisException
Spark SQL - org.apache.spark.sql.AnalysisException
从 org.apache.spark.sql.Row 中提取信息
调用 saveAsTable 时出现 org.apache.spark.sql.AnalysisException
为啥 org.apache.spark.sql.types.DecimalType 在 Spark SQL 中的最大精度值为 38?