为啥在 python 控制台中对 SparkSession.builder..getOrCreate() 的调用被视为命令行 spark-submit?
Posted
技术标签:
【中文标题】为啥在 python 控制台中对 SparkSession.builder..getOrCreate() 的调用被视为命令行 spark-submit?【英文标题】:Why is a call to SparkSession.builder..getOrCreate() in python console being treated like command line spark-submit?为什么在 python 控制台中对 SparkSession.builder..getOrCreate() 的调用被视为命令行 spark-submit? 【发布时间】:2019-07-06 16:59:20 【问题描述】:在python console
内部我正在尝试创建一个Spark Session
(我没有使用pyspark
来隔离依赖项)。为什么会产生spark-submit
命令行提示和错误??
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Error: Missing application resource.
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Usage: spark-submit [options] <app jar | python file | R file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
..
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn,
k8s://https://host:port, or local (Default: local[*]).
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
--class CLASS_NAME Your application's main class (for Java / Scala apps).
--name NAME A name of your application.
--jars JARS Comma-separated list of jars to include on the driver
..
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 7, in getSpark
File "/shared/spark/python/pyspark/sql/session.py", line 173, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/shared/spark/python/pyspark/context.py", line 367, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/shared/spark/python/pyspark/context.py", line 133, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/shared/spark/python/pyspark/context.py", line 316, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/shared/spark/python/pyspark/java_gateway.py", line 46, in launch_gateway
return _launch_gateway(conf)
File "/shared/spark/python/pyspark/java_gateway.py", line 108, in _launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
【问题讨论】:
【参考方案1】:在尝试了超过 15 种资源之后 - 并仔细阅读了大约两倍的资源 - 唯一有效的是这个以前未获好评的答案 https://***.com/a/55326797/1056563:
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"
使用local[2]
或local
或local[*]
并不重要:is 需要的是格式,包括关键的pyspark-shell 一块。
解决这个问题的另一种方法 - 并且更能抵抗环境变幻无常 - 在您的 python
代码中方便地使用以下行:
os.environ["PYSPARK_SUBMIT_ARGS"] = "pyspark-shell"
【讨论】:
以上是关于为啥在 python 控制台中对 SparkSession.builder..getOrCreate() 的调用被视为命令行 spark-submit?的主要内容,如果未能解决你的问题,请参考以下文章
为啥我可以在 Python for 循环中对迭代器和序列使用相同的名称?
为啥我在使用 PySimpleGUI 的 while 循环中对列表框的更新最终导致我的程序挂起?
为啥在 onResume() 中对 View 调用 getWidth() 返回 0?