Pyspark 无法初始化火花上下文

Posted

技术标签:

【中文标题】Pyspark 无法初始化火花上下文【英文标题】:Pyspark couldn't initialize spark context 【发布时间】:2020-04-27 05:28:58 【问题描述】:

我正在尝试使用此spark-submit server2.py --master local[2] 运行火花程序。然后我得到了这个错误:

ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Could not parse Master URL: '<pyspark.conf.SparkConf object at 0x7f69bb067710>'
        at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2924)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:548)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:745)

这是我正在运行的代码:

import networkx as nx
TCP_IP = "192.168.1.136"
TCP_PORT = 5000
from pyspark import SparkConf,SparkContext
from pyspark.streaming import StreamingContext

# Create a Grid graph using networkx library
G = nx.grid_2d_graph(5, 5)  # 5x5 grid

# Creating a Spark Configuration
conf=SparkConf()
conf.setAppName('ShortestPathApp')

sc= SparkContext(conf)
ssc= StreamingContext(sc,1)

def shortestPath(line):
    # get the values from rdd
    vehicleId = line[0]
    source = line[1]
    destination = line[2]
    deadline = line[3]

    # find shortest path
    shortest = nx.dijkstra_path(G, source, destination)
    print(shortest)


# receive from Socket
dataStream =ssc.socketTextStream(TCP_IP,TCP_PORT)
vehicle_data = dataStream.map(lambda line: line.split(" "))
vehicle_data.foreachRDD(lambda rdd: rdd.foreach(shortestPath))
ssc.start()
ssc.awaitTermination()

谁能帮我解决我做错了什么。我尝试了这篇帖子Couldn't initialize spark context 的解决方案,但它不起作用。

【问题讨论】:

你能看到你的$SPARK_HOMEpyspark.__version__是什么吗? 【参考方案1】:

重新排序

spark-submit server2.py --master local[2]

spark-submit --master local[2] server2.py 

【讨论】:

你能告诉我socketTextStream是如何工作的吗?是否需要添加socket.connect来连接服务器接收数据@ernest_k 请回答上述评论@lone_ranger

以上是关于Pyspark 无法初始化火花上下文的主要内容,如果未能解决你的问题,请参考以下文章

PySpark:无法创建火花数据框

HiveContext 与火花 sql

3 pyspark学习---sparkContext概述

火花(pyspark)速度测试

存储火花数据框-pyspark

Pyspark:以表格格式显示火花数据框