Spark saveAsTable append 将数据保存到 hive 但抛出错误:org.apache.hadoop.hive.ql.metadata.Hive.alterTable
Posted
技术标签:
【中文标题】Spark saveAsTable append 将数据保存到 hive 但抛出错误:org.apache.hadoop.hive.ql.metadata.Hive.alterTable【英文标题】:Spark saveAsTable append saves data to hive but throws an error: org.apache.hadoop.hive.ql.metadata.Hive.alterTable 【发布时间】:2021-08-08 18:21:13 【问题描述】:我正在尝试将数据附加到配置单元中的现有表中。但是当我打电话时
sdf.write.format("parquet").mode("append").saveAsTable("db.tbl", path=hdfs_path)
数据已成功保存,但出现此错误:
Py4JJavaError: An error occurred while calling o152.saveAsTable.
: java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.alterTable(java.lang.String, org.apache.hadoop.hive.ql.metadata.Table, org.apache.hadoop.hive.metastore.api.EnvironmentContext)
at java.lang.Class.getMethod(Class.java:1786)
at org.apache.spark.sql.hive.client.Shim.findMethod(HiveShim.scala:177)
at org.apache.spark.sql.hive.client.Shim_v2_1.alterTableMethod$lzycompute(HiveShim.scala:1183)
at org.apache.spark.sql.hive.client.Shim_v2_1.alterTableMethod(HiveShim.scala:1177)
at org.apache.spark.sql.hive.client.Shim_v2_1.alterTable(HiveShim.scala:1230)
at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$alterTable$1(HiveClientImpl.scala:572)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)
at org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:562)
at org.apache.spark.sql.hive.client.HiveClient.alterTable(HiveClient.scala:107)
at org.apache.spark.sql.hive.client.HiveClient.alterTable$(HiveClient.scala:106)
at org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:90)
at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$alterTableStats$1(HiveExternalCatalog.scala:719)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:103)
at org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:705)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.alterTableStats(ExternalCatalogWithListener.scala:133)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:420)
at org.apache.spark.sql.execution.command.CommandUtils$.updateTableStats(CommandUtils.scala:63)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:198)
at org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:538)
at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.saveDataIntoTable(createDataSourceTables.scala:219)
at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:167)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:122)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:121)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963)
at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:727)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:705)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:603)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
我也尝试了一些替代方案:
sdf.write.insertInto("db.tbl",overwrite=False)
sdf.write.mode("append").insertInto("db.tbl")
spark.sql("insert into table value(...)")
但同样的问题。似乎任何向现有表添加数据的尝试都会成功并引发该错误。
“覆盖”模式运行良好。
我使用的 spark 版本是 3.0.1 我使用的 hive 版本是 3.1.0
以前有人遇到过这个问题吗?
【问题讨论】:
【参考方案1】:这看起来像 spark 3 中提到的一些 hive Metastore 工件是 hive 2.x 而不是您使用的 3.x。
【讨论】:
当我打印 spark 配置时,我得到了这个('spark.sql.hive.metastore.version', '3.0')。作为 spark.sql.hive.metastore.jars = Standalone-metastore-1.21.2.3.1.4.41-5-hive3.jar 还在 spark2-client/jars 文件夹中我发现了这个使用过的 jar:hive-metastore-1.21.2.3.1.4.41-5.jar【参考方案2】:您的环境中肯定有错误的 Hive jar:
你的spark指的是Hive 3.x,里面有这个方法alterTable(String, Table, EnvironmentContext)
但是根据您的评论,您有hive-metastore-1.21.2.3.1.4.41-5.jar
,它在Hortonwork 分发下,您可以download the source code 并验证自己,没有这种方法。
【讨论】:
以上是关于Spark saveAsTable append 将数据保存到 hive 但抛出错误:org.apache.hadoop.hive.ql.metadata.Hive.alterTable的主要内容,如果未能解决你的问题,请参考以下文章
Spark 数据框 saveAsTable 正在使用单个任务
Spark:可以使用 DataFrame.saveAsTable 或 DataFrameWriter.options 传递哪些选项?