Namenode 似乎在启动时挂起

Posted

技术标签:

【中文标题】Namenode 似乎在启动时挂起【英文标题】:Namenode appears to hang on start 【发布时间】:2013-06-30 16:36:41 【问题描述】:

版本是 2.0.4

我正在使用来自http://bigtop01.cloudera.org:8080/job/Bigtop-0.6.0-rvs/ws/ 的 Cloudera bigtop debian 软件包,但实际上我遇到了同样的问题,从 Apache 镜像获取最新的 jar 并配置自己。

另外,值得一提的是,我从 Apache 找到了一个 1.1.2 的 debian 包,并且没有这个问题。

所以启动namenode,我得到这个然后它冻结了。

$ sudo -u hdfs hdfs namenode
13/06/26 03:47:44 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = beagle4/127.0.0.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.0.4-SNAPSHOT
STARTUP_MSG:   classpath = /etc/hadoop/conf:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/snappy-java-1.0.3.2.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/zookeeper-3.4.2.jar:/usr/lib/hadoop/lib/avro-1.5.3.jar:/usr/lib/hadoop/lib/jetty-6.1.26.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop/.//hadoop-common-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop/.//hadoop-common-2.0.4-SNAPSHOT-tests.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.4-SNAPSHOT.jar:/contrib/capacity-scheduler/*.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.4-SNAPSHOT-tests.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.3.2.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/netty-3.5.11.Final.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/avro-1.5.3.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/junit-4.8.2.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-client-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-unmanaged-am-launcher-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/lib/snappy-java-1.0.3.2.jar:/usr/lib/hadoop-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-mapreduce/lib/javax.inject-1.jar:/usr/lib/hadoop-mapreduce/lib/netty-3.5.11.Final.jar:/usr/lib/hadoop-mapreduce/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-mapreduce/lib/avro-1.5.3.jar:/usr/lib/hadoop-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-mapreduce/lib/guice-3.0.jar:/usr/lib/hadoop-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-mapreduce/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-mapreduce/lib/aopalliance-1.0.jar:/usr/lib/hadoop-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-mapreduce/.//hadoop-rumen-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-core-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-extras-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-distcp-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-datajoin-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-jobclient-2.0.4-SNAPSHOT-tests.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-jobclient-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-app-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-hs-plugins-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-streaming-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-gridmix-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-common-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-examples-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-archives-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-shuffle-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-hs-2.0.4-SNAPSHOT.jar
STARTUP_MSG:   build = git://github.com/rvs/bigtop.git -r 9be2ff0b8cc3f0ba0c45e5c917aa628aca5259be; compiled by 'jenkins' on 2013-03-26T01:23Z
STARTUP_MSG:   java = 1.7.0_21
************************************************************/
13/06/26 03:47:49 DEBUG impl.MetricsSystemImpl: from system property: null
13/06/26 03:47:49 DEBUG impl.MetricsSystemImpl: from environment variable: null

这里是传递给namenode的参数。

-Djava.net.preferIPv4Stack=true 
-Dhadoop.log.dir=/usr/lib/hadoop/logs 
-Dhadoop.log.file=hadoop.log 
-Dhadoop.home.dir=/usr/lib/hadoop 
-Dhadoop.id.str=hdfs 
-Dhadoop.root.logger=DEBUG,console 
-Djava.library.path=/usr/lib/hadoop/lib/native 
-Dhadoop.policy.file=hadoop-policy.xml 
-Djava.net.preferIPv4Stack=true 
-Dhadoop.security.logger=INFO,RFAS 
-Dhdfs.audit.logger=INFO,NullAppender 
-Dhadoop.security.logger=INFO,NullAppender

这很烦人,因为它没有失败,它只是挂起......关于我应该如何调试它的任何想法?

编辑:好的,我很难过。如果我添加 java 选项来侦听 Java 调试器,它可以工作吗?没看懂……

# works?
exec "$JAVA" -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=1044 -Dproc_$COMMAND $JAVA_HEAP_MAX $HA$

# doesn't work
exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"

编辑 2:jstack 输出 - 开始关注问题

Full thread dump OpenJDK Zero VM (22.0-b10 mixed mode):

"Service Thread" daemon prio=10 tid=0xb5fd7460 nid=0xc7c runnable [0x00000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0xb5fd5978 nid=0xc7b waiting on condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0xb5f935b8 nid=0xc7a in Object.wait() [0x70d7e000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x86413228> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
    - locked <0x86413228> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)

"Reference Handler" daemon prio=10 tid=0xb5f91c08 nid=0xc79 in Object.wait() [0x70efe000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x864132b0> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:503)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
    - locked <0x864132b0> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0xb5f073e0 nid=0xc77 runnable [0xb6210000]
   java.lang.Thread.State: RUNNABLE
    at com.google.common.base.CharMatcher$12.setBits(CharMatcher.java:507)
    at com.google.common.base.CharMatcher$Or.setBits(CharMatcher.java:642)
    at com.google.common.base.CharMatcher.precomputedInternal(CharMatcher.java:673)
    at com.google.common.base.Platform.precomputeCharMatcher(Platform.java:53)
    at com.google.common.base.CharMatcher.precomputed(CharMatcher.java:657)
    at com.google.common.base.CharMatcher.<clinit>(CharMatcher.java:179)
    at com.google.common.base.Splitter.on(Splitter.java:125)
    at org.apache.hadoop.metrics2.impl.MetricsConfig.<clinit>(MetricsConfig.java:84)
    at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configure(MetricsSystemImpl.java:446)
    - locked <0x718606c8> (a org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
    at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:180)
    - locked <0x718606c8> (a org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
    at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:156)
    - locked <0x718606c8> (a org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
    at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:54)
    at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1168)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1235)

"VM Thread" prio=10 tid=0xb5f8b6b0 nid=0xc78 runnable 

"VM Periodic Task Thread" prio=10 tid=0xb5fd90b0 nid=0xc7d waiting on condition 

JNI global references: 137

Heap
 def new generation   total 2304K, used 1789K [0x71660000, 0x718e0000, 0x863a0000)
  eden space 2048K,  74% used [0x71660000, 0x717df758, 0x71860000)
  from space 256K, 100% used [0x71860000, 0x718a0000, 0x718a0000)
  to   space 256K,   0% used [0x718a0000, 0x718a0000, 0x718e0000)
 tenured generation   total 5376K, used 2333K [0x863a0000, 0x868e0000, 0xafe60000)
   the space 5376K,  43% used [0x863a0000, 0x865e76f8, 0x865e7800, 0x868e0000)
 compacting perm gen  total 5632K, used 5487K [0xafe60000, 0xb03e0000, 0xb3e60000)
   the space 5632K,  97% used [0xafe60000, 0xb03bbf28, 0xb03bc000, 0xb03e0000)
No shared spaces configured.

【问题讨论】:

你在哪个 Unix / Linux 发行版上运行它? Ubuntu 13.04(arm-hf 架构)- 绝对不是标准的或 bigtop 支持的,但似乎或多或少可以通过一些强制安装来工作。 【参考方案1】:

问题出在这个罐子上。

guava-11.0.2.jar

使用jstack我能够将锁追溯到

com.google.common.base.CharMatcher$12.setBits(CharMatcher.java:507)

我不知道为什么它会导致我的 namenode/datanode 服务挂起,但解决方案是下载最新版本(当前为 14.0.1),我的所有问题都消失了。

【讨论】:

【参考方案2】:

由于您在 Ubuntu 上运行,可能您在“Hadoop FileSystem.getFS() pauses for about 2 minutes”中遇到了同样的问题,如果缺少/etc/krb5.conf,Hadoop 会挂起几分钟。在 Ubuntu 12 上,运行 sudo apt-get install krb5-config 创建了该文件,并且似乎为我解决了问题。

【讨论】:

不幸的是,这并没有解决它。它不仅会挂起几分钟,而且永远不会启动。我已经放了好几个小时了,没有任何变化。

以上是关于Namenode 似乎在启动时挂起的主要内容,如果未能解决你的问题,请参考以下文章

由于 MEMORY 问题,Hadoop HA 备用 Namenode 启动在安全模式下挂起

Xcode 8 在加载时挂起

XCode 8在加载时挂起

Spring Boot Reactive Mongo 在启动时挂起

如何防止 Eclipse 在启动时挂起?

Facebook iOS SDK 在应用启动时挂起