Hadoop- Namenode经常挂掉 IPC's epoch 9 is less than the last promised epoch 10
Posted RZ_Lee
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hadoop- Namenode经常挂掉 IPC's epoch 9 is less than the last promised epoch 10相关的知识,希望对你有一定的参考价值。
如题出现Namenode经常挂掉 IPC‘s epoch 9 is less than the last promised epoch 10,
2019-01-03 05:36:14,774 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_1073741949_1131{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW], ReplicaUC[[DISK]DS-0e124439-ff5d-49d7-879f-e54f191c4578:NORMAL:192.168.41.89:50010|RBW], ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW]]} for /hbase/WALs/hdp,16020,1546464960046/hdp%2C16020%2C1546464960046..meta.1546464974748.meta 2019-01-03 05:36:14,812 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* fsync: /hbase/WALs/hdp,16020,1546464960046/hdp%2C16020%2C1546464960046..meta.1546464974748.meta for DFSClient_NONMAPREDUCE_-874241219_1 2019-01-03 05:36:15,565 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_1073741950_1132{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW], ReplicaUC[[DISK]DS-0e124439-ff5d-49d7-879f-e54f191c4578:NORMAL:192.168.41.89:50010|RBW], ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW]]} for /hbase/data/hbase/meta/1588230740/.tmp/b45aafd78d7e4888a301aee2751d9038 2019-01-03 05:36:15,714 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.41.90:50010 is added to blk_1073741950_1132{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW], ReplicaUC[[DISK]DS-0e124439-ff5d-49d7-879f-e54f191c4578:NORMAL:192.168.41.89:50010|RBW], ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW]]} size 0 2019-01-03 05:36:15,727 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.41.89:50010 is added to blk_1073741950_1132{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW], ReplicaUC[[DISK]DS-0e124439-ff5d-49d7-879f-e54f191c4578:NORMAL:192.168.41.89:50010|RBW], ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW]]} size 0 2019-01-03 05:36:15,732 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.41.88:50010 is added to blk_1073741950_1132 size 5422 2019-01-03 05:36:15,735 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /hbase/data/hbase/meta/1588230740/.tmp/b45aafd78d7e4888a301aee2751d9038 is closed by DFSClient_NONMAPREDUCE_-874241219_1 2019-01-03 05:36:15,958 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073741948_1130 192.168.41.88:50010 192.168.41.89:50010 192.168.41.90:50010 2019-01-03 05:36:16,013 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /hbase/data/hbase/meta/1588230740/recovered.edits/53.seqid is closed by DFSClient_NONMAPREDUCE_-874241219_1 2019-01-03 05:36:16,133 INFO BlockStateChange: BLOCK* BlockManager: ask 192.168.41.89:50010 to delete [blk_1073741948_1130] 2019-01-03 05:36:16,354 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_1073741951_1133{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW], ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW], ReplicaUC[[DISK]DS-0e124439-ff5d-49d7-879f-e54f191c4578:NORMAL:192.168.41.89:50010|RBW]]} for /hbase/data/hbase/meta/1588230740/.tmp/c79ddf66ee7c4ba69a0c6c428e817c8d 2019-01-03 05:36:16,418 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.41.89:50010 is added to blk_1073741951_1133{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW], ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW], ReplicaUC[[DISK]DS-0e124439-ff5d-49d7-879f-e54f191c4578:NORMAL:192.168.41.89:50010|RBW]]} size 0 2019-01-03 05:36:16,434 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.41.90:50010 is added to blk_1073741951_1133{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW], ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW], ReplicaUC[[DISK]DS-0e124439-ff5d-49d7-879f-e54f191c4578:NORMAL:192.168.41.89:50010|RBW]]} size 0 2019-01-03 05:36:16,440 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.41.88:50010 is added to blk_1073741951_1133{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW], ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW], ReplicaUC[[DISK]DS-0e124439-ff5d-49d7-879f-e54f191c4578:NORMAL:192.168.41.89:50010|RBW]]} size 0 2019-01-03 05:36:16,452 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /hbase/data/hbase/meta/1588230740/.tmp/c79ddf66ee7c4ba69a0c6c428e817c8d is closed by DFSClient_NONMAPREDUCE_-874241219_1 2019-01-03 05:36:16,853 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_1073741952_1134{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW], ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW], ReplicaUC[[DISK]DS-0e124439-ff5d-49d7-879f-e54f191c4578:NORMAL:192.168.41.89:50010|RBW]]} for /hbase/MasterProcWALs/state-00000000000000000022.log 2019-01-03 05:36:16,995 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* fsync: /hbase/MasterProcWALs/state-00000000000000000022.log for DFSClient_NONMAPREDUCE_-586038503_1 2019-01-03 05:36:17,010 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073741932_1112 192.168.41.90:50010 192.168.41.89:50010 192.168.41.88:50010 2019-01-03 05:36:17,296 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease: [Lease. Holder: DFSClient_NONMAPREDUCE_1637648643_1, pendingcreates: 1], src=/hbase/WALs/hdp2,16020,1546464401321-splitting/hdp2%2C16020%2C1546464401321.default.1546464428803 from client DFSClient_NONMAPREDUCE_1637648643_1 2019-01-03 05:36:17,296 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease. Holder: DFSClient_NONMAPREDUCE_1637648643_1, pendingcreates: 1], src=/hbase/WALs/hdp2,16020,1546464401321-splitting/hdp2%2C16020%2C1546464401321.default.1546464428803 2019-01-03 05:36:17,296 INFO BlockStateChange: BLOCK* blk_1073741944_1124{UCState=UNDER_RECOVERY, truncateBlock=null, primaryNodeIndex=1, replicas=[ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW], ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW]]} recovery started, primary=ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW] 2019-01-03 05:36:17,296 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: File /hbase/WALs/hdp2,16020,1546464401321-splitting/hdp2%2C16020%2C1546464401321.default.1546464428803 has not been closed. Lease recovery is in progress. RecoveryId = 1135 for block blk_1073741944_1124{UCState=UNDER_RECOVERY, truncateBlock=null, primaryNodeIndex=1, replicas=[ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW], ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW]]} 2019-01-03 05:36:17,632 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease: [Lease. Holder: DFSClient_NONMAPREDUCE_-877858473_1, pendingcreates: 1], src=/hbase/WALs/hdp,16020,1546464395339-splitting/hdp%2C16020%2C1546464395339.default.1546464427866 from client DFSClient_NONMAPREDUCE_-877858473_1 2019-01-03 05:36:17,632 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease. Holder: DFSClient_NONMAPREDUCE_-877858473_1, pendingcreates: 1], src=/hbase/WALs/hdp,16020,1546464395339-splitting/hdp%2C16020%2C1546464395339.default.1546464427866 2019-01-03 05:36:17,633 INFO BlockStateChange: BLOCK* blk_1073741941_1121{UCState=UNDER_RECOVERY, truncateBlock=null, primaryNodeIndex=2, replicas=[ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW], ReplicaUC[[DISK]DS-0e124439-ff5d-49d7-879f-e54f191c4578:NORMAL:192.168.41.89:50010|RBW], ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW]]} recovery started, primary=ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW] 2019-01-03 05:36:17,633 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: File /hbase/WALs/hdp,16020,1546464395339-splitting/hdp%2C16020%2C1546464395339.default.1546464427866 has not been closed. Lease recovery is in progress. RecoveryId = 1136 for block blk_1073741941_1121{UCState=UNDER_RECOVERY, truncateBlock=null, primaryNodeIndex=2, replicas=[ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW], ReplicaUC[[DISK]DS-0e124439-ff5d-49d7-879f-e54f191c4578:NORMAL:192.168.41.89:50010|RBW], ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW]]} 2019-01-03 05:36:19,133 INFO BlockStateChange: BLOCK* BlockManager: ask 192.168.41.90:50010 to delete [blk_1073741932_1112, blk_1073741948_1130] 2019-01-03 05:36:20,015 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.41.90:50010 is added to blk_1073741944_1124{UCState=UNDER_RECOVERY, truncateBlock=null, primaryNodeIndex=1, replicas=[ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW], ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW]]} size 83 2019-01-03 05:36:20,016 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.41.88:50010 is added to blk_1073741944_1124{UCState=UNDER_RECOVERY, truncateBlock=null, primaryNodeIndex=1, replicas=[ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW], ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW]]} size 83 2019-01-03 05:36:20,017 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(oldBlock=BP-310132273-192.168.41.88-1546299030273:blk_1073741944_1124, newgenerationstamp=1135, newlength=83, newtargets=[192.168.41.90:50010, 192.168.41.88:50010], closeFile=true, deleteBlock=false) 2019-01-03 05:36:20,023 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(oldBlock=BP-310132273-192.168.41.88-1546299030273:blk_1073741944_1124, file=/hbase/WALs/hdp2,16020,1546464401321-splitting/hdp2%2C16020%2C1546464401321.default.1546464428803, newgenerationstamp=1135, newlength=83, newtargets=[192.168.41.90:50010, 192.168.41.88:50010]) successful 2019-01-03 05:36:20,033 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.41.90:50010 is added to blk_1073741941_1121{UCState=UNDER_RECOVERY, truncateBlock=null, primaryNodeIndex=2, replicas=[ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW], ReplicaUC[[DISK]DS-0e124439-ff5d-49d7-879f-e54f191c4578:NORMAL:192.168.41.89:50010|RBW], ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW]]} size 83 2019-01-03 05:36:20,036 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.41.89:50010 is added to blk_1073741941_1121{UCState=UNDER_RECOVERY, truncateBlock=null, primaryNodeIndex=2, replicas=[ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW], ReplicaUC[[DISK]DS-0e124439-ff5d-49d7-879f-e54f191c4578:NORMAL:192.168.41.89:50010|RBW], ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW]]} size 83 2019-01-03 05:36:20,039 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.41.88:50010 is added to blk_1073741941_1121{UCState=UNDER_RECOVERY, truncateBlock=null, primaryNodeIndex=2, replicas=[ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW], ReplicaUC[[DISK]DS-0e124439-ff5d-49d7-879f-e54f191c4578:NORMAL:192.168.41.89:50010|RBW], ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW]]} size 83 2019-01-03 05:36:20,040 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(oldBlock=BP-310132273-192.168.41.88-1546299030273:blk_1073741941_1121, newgenerationstamp=1136, newlength=83, newtargets=[192.168.41.90:50010, 192.168.41.89:50010, 192.168.41.88:50010], closeFile=true, deleteBlock=false) 2019-01-03 05:36:20,046 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(oldBlock=BP-310132273-192.168.41.88-1546299030273:blk_1073741941_1121, file=/hbase/WALs/hdp,16020,1546464395339-splitting/hdp%2C16020%2C1546464395339.default.1546464427866, newgenerationstamp=1136, newlength=83, newtargets=[192.168.41.90:50010, 192.168.41.89:50010, 192.168.41.88:50010]) successful 2019-01-03 05:36:21,950 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /hbase/data/default/user_info/7faad63bbfc155cb40501710a76b1afa/recovered.edits/16.seqid is closed by DFSClient_NONMAPREDUCE_-1652540829_1 2019-01-03 05:36:22,135 INFO BlockStateChange: BLOCK* ask 192.168.41.88:50010 to replicate blk_1073741944_1135 to datanode(s) 192.168.41.89:50010 2019-01-03 05:36:22,135 INFO BlockStateChange: BLOCK* BlockManager: ask 192.168.41.88:50010 to delete [blk_1073741932_1112, blk_1073741948_1130] 2019-01-03 05:36:22,217 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /hbase/data/hbase/namespace/f755ccf016e1b9ba0a237525c8d23d57/recovered.edits/26.seqid is closed by DFSClient_NONMAPREDUCE_-1652540829_1 2019-01-03 05:36:23,048 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.41.89:50010 is added to blk_1073741944_1135 size 83 2019-01-03 05:36:25,135 INFO BlockStateChange: BLOCK* BlockManager: ask 192.168.41.89:50010 to delete [blk_1073741932_1112] 2019-01-03 05:36:34,026 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.41.89:50010 is added to blk_1073741952_1134{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW], ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW], ReplicaUC[[DISK]DS-0e124439-ff5d-49d7-879f-e54f191c4578:NORMAL:192.168.41.89:50010|RBW]]} size 123 2019-01-03 05:36:34,031 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.41.90:50010 is added to blk_1073741952_1134{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW], ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW], ReplicaUC[[DISK]DS-0e124439-ff5d-49d7-879f-e54f191c4578:NORMAL:192.168.41.89:50010|RBW]]} size 123 2019-01-03 05:36:34,046 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.41.88:50010 is added to blk_1073741952_1134 size 2034 2019-01-03 05:36:34,046 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /hbase/MasterProcWALs/state-00000000000000000022.log is closed by DFSClient_NONMAPREDUCE_-586038503_1 2019-01-03 05:36:34,059 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073741952_1134 192.168.41.90:50010 192.168.41.89:50010 192.168.41.88:50010 2019-01-03 05:36:34,140 INFO BlockStateChange: BLOCK* BlockManager: ask 192.168.41.89:50010 to delete [blk_1073741952_1134] 2019-01-03 05:36:37,141 INFO BlockStateChange: BLOCK* BlockManager: ask 192.168.41.88:50010 to delete [blk_1073741952_1134] 2019-01-03 05:36:40,142 INFO BlockStateChange: BLOCK* BlockManager: ask 192.168.41.90:50010 to delete [blk_1073741952_1134] 2019-01-03 05:36:55,450 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 192.168.41.89 2019-01-03 05:36:55,450 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2019-01-03 05:36:55,450 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 1896 2019-01-03 05:36:55,466 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 101 Total time for transactions(ms): 24 Number of transactions batched in Syncs: 21 Number of syncs: 79 SyncTimes(ms): 611 101 2019-01-03 05:36:55,475 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /root/hdpdata/name/current/edits_inprogress_0000000000000001896 -> /root/hdpdata/name/current/edits_0000000000000001896-0000000000000001996 2019-01-03 05:36:55,476 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 1997 2019-01-03 05:38:54,483 WARN org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 64530ms No GCs detected 2019-01-03 05:38:54,591 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy 2019-01-03 05:38:54,592 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 3 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy 2019-01-03 05:38:54,592 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=3, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) 2019-01-03 05:38:54,592 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 3 (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) All required storage types are unavailable: unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} 2019-01-03 05:38:54,599 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy 2019-01-03 05:38:54,599 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 3 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy 2019-01-03 05:38:54,599 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=3, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) 2019-01-03 05:38:54,599 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 3 (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) All required storage types are unavailable: unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} 2019-01-03 05:38:55,507 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_1073741953_1137{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-4d1a4a8a-a442-4db7-80d2-16699acc814c:NORMAL:192.168.41.88:50010|RBW], ReplicaUC[[DISK]DS-2bb9dc09-913e-4ce4-a274-9b0bedc6bfc1:NORMAL:192.168.41.90:50010|RBW], ReplicaUC[[DISK]DS-0e124439-ff5d-49d7-879f-e54f191c4578:NORMAL:192.168.41.89:50010|RBW]]} for /hbase/MasterProcWALs/state-00000000000000000023.log 2019-01-03 05:38:55,507 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 4 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms): 6 22 2019-01-03 05:38:55,524 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal 192.168.41.89:8485 failed to write txns 1998-2000. Will try to write to this JN again after the next log roll. org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC‘s epoch 9 is less than the last promised epoch 10 at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:418) at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:446) at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:341) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:148) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) at org.apache.hadoop.ipc.Client.call(Client.java:1475) at org.apache.hadoop.ipc.Client.call(Client.java:1412) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy11.journal(Unknown Source) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:167) at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:385) at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:378) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-01-03 05:38:55,526 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal 192.168.41.90:8485 failed to write txns 1998-2000. Will try to write to this JN again after the next log roll. org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC‘s epoch 9 is less than the last promised epoch 10 at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:418) at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:446) at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:341) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:148) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) at org.apache.hadoop.ipc.Client.call(Client.java:1475) at org.apache.hadoop.ipc.Client.call(Client.java:1412) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy11.journal(Unknown Source) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:167) at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:385) at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:378) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-01-03 05:38:55,527 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [192.168.41.88:8485, 192.168.41.89:8485, 192.168.41.90:8485], stream=QuorumOutputStream starting at txid 1997)) org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/3. 1 successful responses: 192.168.41.88:8485: null [success] 2 exceptions thrown: 192.168.41.90:8485: IPC‘s epoch 9 is less than the last promised epoch 10 at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:418) at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:446) at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:341) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:148) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) 192.168.41.89:8485: IPC‘s epoch 9 is less than the last promised epoch 10 at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:418) at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:446) at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:341) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:148) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223) at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142) at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107) at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113) at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107) at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57) at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:647) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.storeAllocatedBlock(FSNamesystem.java:3164) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3038) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:725) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) 2019-01-03 05:38:55,528 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting QuorumOutputStream starting at txid 1997 2019-01-03 05:38:55,530 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2019-01-03 05:38:55,531 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hdp/192.168.41.88 ************************************************************/
解决方案
可以在core-site.xml文件中修改ha.health-monitor.rpc-timeout.ms参数值,来扩大zkfc监控检查超时时间。
<property> <name>ha.health-monitor.rpc-timeout.ms</name> <value>180000</value> </property>
以上是关于Hadoop- Namenode经常挂掉 IPC's epoch 9 is less than the last promised epoch 10的主要内容,如果未能解决你的问题,请参考以下文章