带有 conda 集成的 pySpark 抛出错误 pyspark 无法识别
Posted
技术标签:
【中文标题】带有 conda 集成的 pySpark 抛出错误 pyspark 无法识别【英文标题】:pySpark with conda integration throwing error pyspark not recognized 【发布时间】:2020-02-17 19:10:31 【问题描述】:随后的步骤:
安装了 Java、Python、Spark、Anaconda 并在每个中设置了路径。但是命令提示符中的 pyspark
没有将 Jupyter 链接到笔记本。
得到以下错误:
"'pyspark' 不是内部或外部命令、可运行程序或批处理文件。"
【问题讨论】:
在本地设置 pySpark 是 tidius,在 ubuntu 上试过,但 import pySpark 会抛出 wiered 错误,所以移到 windows。 【参考方案1】: Follow these steps:
Install JAVA
1.Download Python
Python 3.x
[https://www.python.org/downloads/][1]
2.Set Path
As we have select the "set path" option we don’t have o set the path manually.
3.Verify Python Install or not
a)
Cmd>python -V
b)
Open Python terminal by writing "python" command in the terminal-IDLEs
InStall spark
Verify PySpark Installed or not:-
===================================================
Cmd>pyspark
It will open pyspark shell i.e python shell i.e IDLEs
IDLEs is an interactive shell to write python applications
First Pyspark Application:-
===================================================
We can write PySpark Application in 2 modes. They are:
1.Interactive --Pysaprk Shell
2.Batch Application---IDEs --Integrated Development Environments
(Jupyter Notebook,Pycharms,etc)
How to develop first pyspark appliction in interactive mode??
===================================================
e.g Load local file and count no.of rows and print data
Cmd>pyspark
--> it will open pyspark ahell
-->It is created sparkContext with variable name "sc"
-->SparkContext is a predefined class,it is required to write Spark Application
>>>sc
<SparkContext master=local[*] appName=PySparkShell>
ANACONDA Installation:
============================================
Jupyter Notebook installation
1.Download Anaconda
https://www.anaconda.com/distribution/
2.Install Anaconda
By double click .exe file choose all default options
3.set Path Variable (This is optional when se;ect add path environment at the time of
installation)
4.Start Anaconda and Open Jupyter
Configuring PySpark with Jupyter Notebook:-
============================================
1.Python or Anaconda software must be installed(Jupiter Notebook)
2.PySpark must be installed.
How to open Pyspark:
==================
Cmd>pyspark
How PySpark to start Jupyter Notebook:
==========================
We can start Jupyter notebook in two ways. They are:
1.Start Anaconda Navigater--->Launch Jupyter Notebook
2.Open command prompt and type
Cmd>jupyter notebook
Here we write Python Application
Set Environmental Variable:-
=========================
PYSPARK_DRIVER_PYTHON=jupyter
PYSPAR_DRIVER_PYTHON_OPTS=notebook
[1]: https://www.python.org/downloads/
【讨论】:
以上是关于带有 conda 集成的 pySpark 抛出错误 pyspark 无法识别的主要内容,如果未能解决你的问题,请参考以下文章
Conda 搭建jupyter notebook + pyspark
Pyspark:Spyder 中的 SparkContext 定义引发 Java 网关错误
带有 AM Container 限制错误的 pyspark 错误
jupyter抛出错误:socket.gaierror:[Errno -2]名称或服务未知