Spark REST API,在Windows上提交应用程序NullPointerException

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark REST API,在Windows上提交应用程序NullPointerException相关的知识,希望对你有一定的参考价值。

我使用Spark 2.3.1将我的PC用作Spark服务器,同时使用Spark Worker。

起初,我使用了我的Ubuntu 16.04 LTS。一切正常,我试图运行SparkPi示例(使用spark-submit和spark-shell),它能够毫无问题地运行。我还尝试使用Spark中的REST API运行它,使用此POST字符串:

curl -X POST http://192.168.1.107:6066/v1/submissions/create --header "Content-Type:application/json" --data '{
  "action": "CreateSubmissionRequest",
  "appResource": "file:/home/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
  "clientSparkVersion": "2.3.1",
  "appArgs": [ "10" ],
  "environmentVariables" : {
    "SPARK_ENV_LOADED" : "1"
  },
  "mainClass": "org.apache.spark.examples.SparkPi",
  "sparkProperties": {
    "spark.jars": "file:/home/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
    "spark.driver.supervise":"false",
    "spark.executor.memory": "512m",
    "spark.driver.memory": "512m",
    "spark.submit.deployMode":"cluster",
    "spark.app.name": "SparkPi",
    "spark.master": "spark://192.168.1.107:7077"
  }
}'

在对此进行测试之后,我必须转向Windows,因为无论如何它都将在Windows上完成。我能够运行服务器和worker(手动),添加winutils.exe,并使用spark-shell和spark-submit运行SparkPi示例,一切都能够运行。问题是当我使用REST API时,使用此POST字符串:

curl -X POST http://192.168.1.107:6066/v1/submissions/create --header "Content-Type:application/json" --data '{
      "action": "CreateSubmissionRequest",
      "appResource": "file:D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
      "clientSparkVersion": "2.3.1",
      "appArgs": [ "10" ],
      "environmentVariables" : {
        "SPARK_ENV_LOADED" : "1"
      },
      "mainClass": "org.apache.spark.examples.SparkPi",
      "sparkProperties": {
        "spark.jars": "file:D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
        "spark.driver.supervise":"false",
        "spark.executor.memory": "512m",
        "spark.driver.memory": "512m",
        "spark.submit.deployMode":"cluster",
        "spark.app.name": "SparkPi",
        "spark.master": "spark://192.168.1.107:7077"
      }
    }'

只有路径有点不同,但我的工人总是失败。日志说:

"Exception from the cluster: java.lang.NullPointerException                                                
org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:151)
org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scal173)
org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92)"

我搜索但没有解决方案来了..

答案

所以,最后我找到了原因。

我从以下网站阅读了来源:https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala

从检查它,我得出结论,问题不是来自Spark,但参数未被正确读取。这意味着什么,我把错误的参数格式。

所以,在尝试了几件事后,这一件是正确的:

appResource": "file:D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar"

变成:

appResource": "file:///D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar"

我也用spark.jars param做了同样的事情。

这个小小的差异花了我差不多24小时工作...... ~~~~

以上是关于Spark REST API,在Windows上提交应用程序NullPointerException的主要内容,如果未能解决你的问题,请参考以下文章

通过Spark Rest 服务监控Spark任务执行情况

利用 Spark DataSource API 实现Rest数据源

如何使延迟加载 Apache Spark Dataframe 连接到 REST API

如何使用 Yarn ResourceManager REST API 杀死 Spark 应用程序

spark rest api /api/v1 给出了不允许的方法

如何将 SPARK/Flink 流数据处理创建为微服务(REST API)