Spark 1.2 无法连接到 HDP 2.2 上的 HDFS

Posted

技术标签:

【中文标题】Spark 1.2 无法连接到 HDP 2.2 上的 HDFS【英文标题】:Spark 1.2 cannot connect to HDFS on HDP 2.2 【发布时间】:2015-02-04 18:58:38 【问题描述】:

我按照此教程 http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/ 在 HDP 2.2 上安装 Spark。

但它告诉我 dfs 拒绝了我的连接! 我的命令:

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m  --executor-memory 512m  --executor-cores 1  lib/spark-examples*.jar 10

这是日志:

tput: No value for $TERM and no -T specified
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/02/04 13:52:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/02/04 13:52:52 INFO impl.TimelineClientImpl: Timeline service address: http://amb7.a.b.c:8188/ws/v1/timeline/
15/02/04 13:52:53 INFO client.RMProxy: Connecting to ResourceManager at amb7.a.b.c/172.0.22.8:8050
15/02/04 13:52:53 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers
15/02/04 13:52:53 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (2048 MB per container)
15/02/04 13:52:53 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/02/04 13:52:53 INFO yarn.Client: Setting up container launch context for our AM
15/02/04 13:52:53 INFO yarn.Client: Preparing resources for our AM container
15/02/04 13:52:54 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
15/02/04 13:52:54 INFO yarn.Client: Uploading resource file:/tmp/spark-1.2.0.2.2.0.0-82-bin-2.6.0.2.2.0.0-2041/lib/spark-assembly-1.2.0.2.2.0.0-82-hadoop2.6.0.2.2.0.0-2041.jar -> hdfs://amb1.a.b.c:8020/user/hdfs/.sparkStaging/application_1423073070725_0007/spark-assembly-1.2.0.2.2.0.0-82-hadoop2.6.0.2.2.0.0-2041.jar
15/02/04 13:52:54 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1611)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1409)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1362)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:589)
15/02/04 13:52:54 INFO hdfs.DFSClient: Abandoning BP-470883394-172.0.91.7-1423072968591:blk_1073741885_1061
15/02/04 13:52:54 INFO hdfs.DFSClient: Excluding datanode 172.0.11.0:50010
15/02/04 13:52:55 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1611)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1409)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1362)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:589)
15/02/04 13:52:55 INFO hdfs.DFSClient: Abandoning BP-470883394-172.0.91.7-1423072968591:blk_1073741886_1062
15/02/04 13:52:55 INFO hdfs.DFSClient: Excluding datanode 172.0.81.0:50010
15/02/04 13:52:55 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1611)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1409)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1362)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:589)
15/02/04 13:52:55 INFO hdfs.DFSClient: Abandoning BP-470883394-172.0.91.7-1423072968591:blk_1073741887_1063
15/02/04 13:52:55 INFO hdfs.DFSClient: Excluding datanode 172.0.7.0:50010
15/02/04 13:52:55 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1611)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1409)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1362)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:589)
15/02/04 13:52:55 INFO hdfs.DFSClient: Abandoning BP-470883394-172.0.91.7-1423072968591:blk_1073741888_1064
15/02/04 13:52:55 INFO hdfs.DFSClient: Excluding datanode 172.0.65.0:50010
15/02/04 13:52:55 WARN hdfs.DFSClient: DataStreamer Exception
java.io.IOException: Unable to create new block.
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1375)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:589)
15/02/04 13:52:55 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/hdfs/.sparkStaging/application_1423073070725_0007/spark-assembly-1.2.0.2.2.0.0-82-hadoop2.6.0.2.2.0.0-2041.jar" - Aborting...
Exception in thread "main" java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1611)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1409)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1362)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:589)

有人这么说 (http://tiku.io/questions/4795653/unable-to-run-spark-1-0-sparkpi-on-hdp-2-0),但我听不懂。

我的环境是

HDP: 2.2
Ambari: 1.7
Spark 1.2

【问题讨论】:

【参考方案1】:

您必须修改管理Scheduler的参数,我根据您集群中可用的内存容量增加它。 在纱线设置中:

yarn.scheduler.maximun-allocation-mb = 3072(或更大的数量)

【讨论】:

以上是关于Spark 1.2 无法连接到 HDP 2.2 上的 HDFS的主要内容,如果未能解决你的问题,请参考以下文章

NiFi 处理器无法连接到 Zookeeper

无法通过 Spark 连接到 Mongo DB

无法使用 Jupyter 笔记本上的 pyspark 从 Apache Spark 连接到 MS SQL

Laravel Spark - 无法连接到 repo

无法使用 JDBC 连接到 Spark thriftserver

无法从 NET Core 2.2 Web API 连接到 docker sql server