py4JJava 错误 - 使用 select 语句时出错

Posted

技术标签:

【中文标题】py4JJava 错误 - 使用 select 语句时出错【英文标题】:py4JJava Error - error while using select statement 【发布时间】:2020-04-16 10:38:52 【问题描述】:

我在 Zeppelin 笔记本中使用 pspark 并尝试使用 SELECT 语句获取数据。我只是想查询一个表,但出现以下命令的奇怪错误:

%pyspark
spark.sql('select * from default.abc').show()

这是我得到的错误:

Py4JJavaError: An error occurred while calling o92.sql.
: java.lang.NoSuchMethodError: com.facebook.fb303.FacebookService$Client.sendBaseOneway(Ljava/lang/String;Lorg/apache/thrift/TBase;)V
    at com.facebook.fb303.FacebookService$Client.send_shutdown(FacebookService.java:436)
    at com.facebook.fb303.FacebookService$Client.shutdown(FacebookService.java:430)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.close(HiveMetaStoreClient.java:606)
    at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
    at com.sun.proxy.$Proxy39.close(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2477)
    at com.sun.proxy.$Proxy39.close(Unknown Source)
    at org.apache.hadoop.hive.ql.metadata.Hive.close(Hive.java:414)
    at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:330)
    at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:317)
    at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:293)
    at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:278)
    at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221)
    at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220)
    at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266)
    at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:356)
    at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:217)
    at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:217)
    at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:217)
    at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
    at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:216)
    at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.databaseExists(ExternalCatalogWithListener.scala:71)
    at org.apache.spark.sql.catalyst.catalog.SessionCatalog.databaseExists(SessionCatalog.scala:238)
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.isRunningDirectlyOnFiles(Analyzer.scala:750)
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:683)
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:715)
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:708)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$1.apply(AnalysisHelper.scala:87)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$1.apply(AnalysisHelper.scala:87)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326)
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:87)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:708)
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:654)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
    at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
    at scala.collection.immutable.List.foldLeft(List.scala:84)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:127)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:121)
    at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:106)
    at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:105)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
    at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
    at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
    at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:78)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

(<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error occurred while calling o92.sql.\n', JavaObject id=o905), <traceback object at 0x7f49fa356b48>)

我也验证了表格列表。

%spark.pyspark
df = sqlContext.sql('show tables')
df.show()

+--------+----------------+-----------+
|database|       tableName|isTemporary|
+--------+----------------+-----------+
| default|             abc|      false|
| default|          abc321|      false|
| default|          abtest|      false|

另外,这里是我在配置文件(zeppelin-env.sh)中设置的参数

export MASTER=yarn-client
export HADOOP_CONF_DIR="/etc/hadoop/conf"
export SPARK_HOME=/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/spark
export JAVA_HOME=/usr/java/jdk1.8.0_202-amd64
export PYSPARK_PYTHON=/opt/anaconda3/bin/python
export PYSPARK_DRIVER_PYTHON=/opt/anaconda3/bin/python
export PYTHONPATH=/opt/anaconda3/bin/
export PYTHONPATH=/opt/anaconda3/bin

更新: 根据this 链接,我已将 libthrift-0.9.2.jar 替换为 libthrift-0.9.3.jar。然后重启zeppelin还是不行

我哪里错了。

【问题讨论】:

您的错误似乎与 zeppelin 本身无关。 o92.sql 是什么,facebook 库在你的堆栈跟踪中做什么?这是非常规的。 没错,我也很困惑。我还没有安装任何与 Facebook 相关的库。 好的,我知道这里可能会发生什么。实际上有很多事情发生,你可能会错过许多可能的层(我真的是说很多)。 com.facebook.fb303 看起来是一个核心节俭库。 Py4j 用于调用核心 spark scala 代码。如您所见,您的查询通过了执行过程和催化剂优化器,它正在调用 Hive Metastore 以进行逻辑规划解决方案,而您的 Hive Metastore 没有通过 Thirft 调用响应。这笔费用解释了为什么它在启动时会时不时地工作,一段时间后连接可能会由于某种原因丢失。 请记住,机器中有很多部件,您必须了解每一个部件,以了解真正发生了什么以及哪里可能出错。我会先直接尝试 spark shell 并远离 Zeppelin(因为它只会增加许多可能的集成问题),然后看看它是如何从那里开始的。 我明白了,重启后这个查询有效。另外,我注意到另一件事,我看到的表格列表是不同的,我可以通过色调看到。有什么建议吗? 【参考方案1】:

尝试在依赖部分添加 libthrift-0.9.3 jar 文件的路径

【讨论】:

我在我的系统中下载了 libthrift-0.9.3.jar 并在依赖工件(Apache Zeppelin - Spark Interpreter)中设置了本地路径。然后我重新启动了解释器,它就像一个魅力。好收获!

以上是关于py4JJava 错误 - 使用 select 语句时出错的主要内容,如果未能解决你的问题,请参考以下文章

使用 pyspark 同时编写 parquet 文件

Iphone 开发中的错误 - 未声明的问候语(在此函数中首次使用)

运行 PUT 时 SnowSQL 握手错误

json 获取语​​音激活物理详细信息 - 错误服务未定义

axios 使用方法 以及服务器端 设置拦截发送404状态的提示语,当网络错误时候返回前端的提示, 当网络正常的时候返回后端的提示

NamedQuery 返回具有空字段的实体