Hadoop运行中NameNode闪退和运行mapreducer时卡在Running job.....

Posted LIUXUN1993728

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hadoop运行中NameNode闪退和运行mapreducer时卡在Running job.....相关的知识,希望对你有一定的参考价值。

开始安装Hadoop时 第一次成功启动 包括MapReducer程序也能成功运行。后来不知道什么原因 进入了Safe mode

即使退出了安全模式照样不能对HDFS进行任何修改操作,索性hdfs namenode -format格式化一下,连启动都无法启动了,修改NameNode和DataNode的clusterID一致后 虽然修改HDFS问题解决了,但是运行任务时总是卡在了Running job ...... 

我按照网上的很多方法几乎都试遍了 比如 删除大文件  加大内存 关闭防火墙 等等 还是卡在Running job....

查看日志 打印的错误 在网上都搜不到好的解决方案  没辙了 只好重新安装Linux 然后安装Hadoop 

再次运行还是卡住了   我靠把老子差点整哭了 没办法又得重新安装 

我想是不是因为修改域名后 没有重启生效就启动Hadoop的原因 所以就改变了方式 在配置完成后reboot重启 

然后启动Hadoop 发现成功启动 但是没有过一会  再次查看进程 发现NameNode闪退了

错误日志大概如下:

org.apache.hadoop.yarn.server.nodemanager.NodeMana
ger: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.e
xceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,R
egistration of NodeManager failed, Message from ResourceManager: NodeManager fro
m  hadoop1 doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the...
然后再网上查找了解决方法 http://f.dataguru.cn/thread-530444-1-1.html
在yarn-site.xml中又添加了如下内容:

    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>1024</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>1</value>
    </property>

算是能成功启动 NameNode不闪退了,在运行MapReducer时也不卡住了

但是却报异常:

Caused by: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested memory < 0, or requested memory > max configured, requestedMemory=1536, maxMemory=1024
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:272)
说是内存分配不够

我索性修改yarn-site.xml 将内存修改为4G

我的yarn-site.xml配置如下:

<configuration>
        <!-- 指定YARN的管理对象(ResourceManager)的地址 -->
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>hadoop1</value>
        </property>
        <!-- reducer获取数据的方式 -->
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>

        <property>
                <name>yarn.nodemanager.resource.memory-mb</name>
                <value>4096</value>
        </property>
         <property>
                <name>yarn.nodemanager.resource.cpu-vcores</name>
                <value>1</value>
         </property>
</configuration>
重新启动Hadoop 再次运行 也不卡住了。

如下所示:

[root@hadoop1 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.4.jar wordcount /words /t
17/08/18 00:04:25 INFO client.RMProxy: Connecting to ResourceManager at hadoop1/192.168.0.10:8032
17/08/18 00:04:27 INFO input.FileInputFormat: Total input paths to process : 1
17/08/18 00:04:27 INFO mapreduce.JobSubmitter: number of splits:1
17/08/18 00:04:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1503029045924_0001
17/08/18 00:04:28 INFO impl.YarnClientImpl: Submitted application application_1503029045924_0001
17/08/18 00:04:28 INFO mapreduce.Job: The url to track the job: http://hadoop1:8088/proxy/application_1503029045924_0001/
17/08/18 00:04:28 INFO mapreduce.Job: Running job: job_1503029045924_0001
17/08/18 00:04:39 INFO mapreduce.Job: Job job_1503029045924_0001 running in uber mode : false
17/08/18 00:04:39 INFO mapreduce.Job:  map 0% reduce 0%
17/08/18 00:04:44 INFO mapreduce.Job:  map 100% reduce 0%
17/08/18 00:04:51 INFO mapreduce.Job:  map 100% reduce 100%
17/08/18 00:04:51 INFO mapreduce.Job: Job job_1503029045924_0001 completed successfully
17/08/18 00:04:51 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=64
                FILE: Number of bytes written=283477
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=146
                HDFS: Number of bytes written=38
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=3662
                Total time spent by all reduces in occupied slots (ms)=3908
                Total time spent by all map tasks (ms)=3662
                Total time spent by all reduce tasks (ms)=3908
                Total vcore-milliseconds taken by all map tasks=3662
                Total vcore-milliseconds taken by all reduce tasks=3908
                Total megabyte-milliseconds taken by all map tasks=3749888
                Total megabyte-milliseconds taken by all reduce tasks=4001792
        Map-Reduce Framework
                Map input records=5
                Map output records=10
                Map output bytes=96
                Map output materialized bytes=64
                Input split bytes=90
                Combine input records=10
                Combine output records=5
                Reduce input groups=5
                Reduce shuffle bytes=64
                Reduce input records=5
                Reduce output records=5
                Spilled Records=10
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=111
                CPU time spent (ms)=1020
                Physical memory (bytes) snapshot=336941056
                Virtual memory (bytes) snapshot=4129792000
                Total committed heap usage (bytes)=219676672
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=56
        File Output Format Counters 
                Bytes Written=38


我估计NameNode闪退和Running job卡住原因是同一个

即使Linux内存很大 但是没有分配给yarn 等于没用

总结:

(1) 修改/etc/hosts完成主机IP映射时 除了IP+当前主机名外还必须要有127.0.0.1 localhost 否则会出错

(2) 修改主机名和/etc/hosts 后一定要reboot重启 方可安装配置Hadoop以及格式化HDFS

(3) 在yarn-site.xml 一定要添加配置 分配足量内存才行

    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>1024 (为了防止job卡住和NodeManager闪退 一定要分配足量内存尽量大于2048)</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>1</value>
    </property>

以上是关于Hadoop运行中NameNode闪退和运行mapreducer时卡在Running job.....的主要内容,如果未能解决你的问题,请参考以下文章

Hadoop 的 NameNode 和 DataNode 服务没有在 single_mode 下运行

Hadoop之NameNode目录结构

hadoop执行jps发现没有 DataNode 或 NameNode

运行Namenode时出错

Hadoop可以运行的模式

Hadoop Datanodes 找不到 NameNode