Pyspark 无法初始化火花上下文
Posted
技术标签:
【中文标题】Pyspark 无法初始化火花上下文【英文标题】:Pyspark couldn't initialize spark context 【发布时间】:2020-04-27 05:28:58 【问题描述】:我正在尝试使用此spark-submit server2.py --master local[2]
运行火花程序。然后我得到了这个错误:
ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Could not parse Master URL: '<pyspark.conf.SparkConf object at 0x7f69bb067710>'
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2924)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:548)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:745)
这是我正在运行的代码:
import networkx as nx
TCP_IP = "192.168.1.136"
TCP_PORT = 5000
from pyspark import SparkConf,SparkContext
from pyspark.streaming import StreamingContext
# Create a Grid graph using networkx library
G = nx.grid_2d_graph(5, 5) # 5x5 grid
# Creating a Spark Configuration
conf=SparkConf()
conf.setAppName('ShortestPathApp')
sc= SparkContext(conf)
ssc= StreamingContext(sc,1)
def shortestPath(line):
# get the values from rdd
vehicleId = line[0]
source = line[1]
destination = line[2]
deadline = line[3]
# find shortest path
shortest = nx.dijkstra_path(G, source, destination)
print(shortest)
# receive from Socket
dataStream =ssc.socketTextStream(TCP_IP,TCP_PORT)
vehicle_data = dataStream.map(lambda line: line.split(" "))
vehicle_data.foreachRDD(lambda rdd: rdd.foreach(shortestPath))
ssc.start()
ssc.awaitTermination()
谁能帮我解决我做错了什么。我尝试了这篇帖子Couldn't initialize spark context 的解决方案,但它不起作用。
【问题讨论】:
你能看到你的$SPARK_HOME
和pyspark.__version__
是什么吗?
【参考方案1】:
重新排序
spark-submit server2.py --master local[2]
到
spark-submit --master local[2] server2.py
【讨论】:
你能告诉我socketTextStream是如何工作的吗?是否需要添加socket.connect来连接服务器接收数据@ernest_k 请回答上述评论@lone_ranger以上是关于Pyspark 无法初始化火花上下文的主要内容,如果未能解决你的问题,请参考以下文章