带有 conda 集成的 pySpark 抛出错误 pyspark 无法识别

Posted

技术标签:

【中文标题】带有 conda 集成的 pySpark 抛出错误 pyspark 无法识别【英文标题】:pySpark with conda integration throwing error pyspark not recognized 【发布时间】:2020-02-17 19:10:31 【问题描述】:

随后的步骤: 安装了 Java、Python、Spark、Anaconda 并在每个中设置了路径。但是命令提示符中的 pyspark 没有将 Jupyter 链接到笔记本。

得到以下错误:

"'pyspark' 不是内部或外部命令、可运行程序或批处理文件。"

【问题讨论】:

在本地设置 pySpark 是 tidius,在 ubuntu 上试过,但 import pySpark 会抛出 wiered 错误,所以移到 windows。 【参考方案1】:
    Follow these steps:
    Install JAVA
    1.Download Python
    Python 3.x
    [https://www.python.org/downloads/][1]


    2.Set Path
    As we have select the "set path" option we don’t have o set the path manually.
    3.Verify Python Install or not
    a)
    Cmd>python -V
    b)
    Open Python terminal by writing "python" command in the terminal-IDLEs

    InStall spark
   Verify PySpark Installed or not:-
   ===================================================
   Cmd>pyspark

   It will open pyspark shell i.e python shell i.e IDLEs
   IDLEs is an interactive shell to write python applications

   First Pyspark Application:-
   ===================================================
   We can write PySpark Application in 2 modes. They are:
   1.Interactive --Pysaprk Shell
   2.Batch Application---IDEs --Integrated Development Environments
                    (Jupyter Notebook,Pycharms,etc)

   How to develop first pyspark appliction in interactive mode??
   ===================================================
   e.g Load local file and count no.of rows and print data

   Cmd>pyspark
   --> it will open pyspark ahell
   -->It is created sparkContext with variable name "sc"
   -->SparkContext is a predefined class,it is required to write Spark Application
   >>>sc  
   <SparkContext master=local[*] appName=PySparkShell>

    ANACONDA Installation:
    ============================================
    Jupyter Notebook installation

    1.Download Anaconda
    https://www.anaconda.com/distribution/

    2.Install Anaconda
    By double click .exe file choose all default options
    3.set Path Variable (This is optional when se;ect add path environment at the time of 
    installation)
    4.Start Anaconda and Open Jupyter
    Configuring PySpark with Jupyter Notebook:-
    ============================================
    1.Python or Anaconda software must be installed(Jupiter Notebook)
    2.PySpark must be installed.
    How to open Pyspark:
    ==================
    Cmd>pyspark
    How PySpark to start Jupyter Notebook:
    ==========================
    We can start Jupyter notebook in two ways. They are:

    1.Start Anaconda Navigater--->Launch Jupyter Notebook
    2.Open command prompt and type
   Cmd>jupyter notebook
Here we write Python Application    
Set Environmental Variable:-
=========================
PYSPARK_DRIVER_PYTHON=jupyter
PYSPAR_DRIVER_PYTHON_OPTS=notebook   
  [1]: https://www.python.org/downloads/

【讨论】:

以上是关于带有 conda 集成的 pySpark 抛出错误 pyspark 无法识别的主要内容,如果未能解决你的问题,请参考以下文章

Conda 搭建jupyter notebook + pyspark

Pyspark:Spyder 中的 SparkContext 定义引发 Java 网关错误

带有 AM Container 限制错误的 pyspark 错误

jupyter抛出错误:socket.gaierror:[Errno -2]名称或服务未知

Pyspark 错误:“未定义函数:'from_timestamp'

集成时抛出 GitLab Webhook 错误