无头模式下 HDP 3.1 上的 spark 3.x 与配置单元 - 未找到配置单元表

Posted 2023-02-16

技术标签:

【中文标题】无头模式下 HDP 3.1 上的 spark 3.x 与配置单元 - 未找到配置单元表【英文标题】：spark 3.x on HDP 3.1 in headless mode with hive - hive tables not found 【发布时间】：2020-08-31 09:37:22 【问题描述】：

如何使用无头 (https://spark.apache.org/docs/latest/hadoop-provided.html) 版本的 spark 在 HDP 3.1 上配置 Spark 3.x 以与 hive 交互？

首先，我已经下载并解压了headless spark 3.x：

cd ~/development/software/spark-3.0.0-bin-without-hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf/
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export SPARK_DIST_CLASSPATH=$(hadoop --config /usr/hdp/current/spark2-client/conf classpath)
 
ls /usr/hdp # note version ad add it below and replace 3.1.x.x-xxx with it

./bin/spark-shell --master yarn --queue myqueue --conf spark.driver.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.yarn.am.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.hadoop.metastore.catalog.default=hive --files /usr/hdp/current/hive-client/conf/hive-site.xml

spark.sql("show databases").show
// only showing default namespace, existing hive tables are missing
+---------+
|namespace|
+---------+
|  default|
+---------+

spark.conf.get("spark.sql.catalogImplementation")
res2: String = in-memory # I want to see hive here - how? How to add hive jars onto the classpath?

注意

这是用于 Spark 3.x ond HDP 3.1 和 custom spark does not find hive databases when running on yarn 的 How can I run spark in headless mode in my custom version on HDP? 的更新版本。

此外：我知道 Spark 中 ACID 配置单元表的问题。现在，我只想能够看到现有的数据库

编辑

我们必须将 hive jar 放到类路径中。尝试如下：

 export SPARK_DIST_CLASSPATH="/usr/hdp/current/hive-client/lib*:$SPARK_DIST_CLASSPATH"

现在使用 spark-sql：

./bin/spark-sql --master yarn --queue myqueue--conf spark.driver.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.yarn.am.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.hadoop.metastore.catalog.default=hive --files /usr/hdp/current/hive-client/conf/hive-site.xml

失败：

Error: Failed to load class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.
Failed to load main class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.

即这条线：export SPARK_DIST_CLASSPATH="/usr/hdp/current/hive-client/lib*:$SPARK_DIST_CLASSPATH"，没有效果（如果没有设置，同样的问题）。

【问题讨论】：

【参考方案1】：

如上所述，custom spark does not find hive databases when running on yarn 需要 Hive JAR。无头版本中不提供它们。

我无法改装这些。

解决方案：不用担心：只需使用带有 Hadoop 3.2 的 spark 构建（在 HDP 3.1 上）

【讨论】：

以上是关于无头模式下 HDP 3.1 上的 spark 3.x 与配置单元 - 未找到配置单元表的主要内容，如果未能解决你的问题，请参考以下文章