只能复制到 0 个节点而不是 minReplication (=1)。有 2 个数据节点正在运行,并且在此操作中不排除任何节点

Posted

技术标签:

【中文标题】只能复制到 0 个节点而不是 minReplication (=1)。有 2 个数据节点正在运行,并且在此操作中不排除任何节点【英文标题】:could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and no node(s) are excluded in this operation 【发布时间】:2017-04-13 07:39:29 【问题描述】:

当我对 hive 执行“sqoop import ...”时出现此错误。

namenode log
java.io.IOException: File /input/xxxx/_temporary/1/_temporary/attempt_1492073551248_0012_m_000002_1/part-m-00002 could only be replicated to 0 nodes instead of minReplication (=1).  There are 2 datanode(s) running and no node(s) are excluded in this operation.
datanode logs
slave1 :2017-04-13 19:58:59,444 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.149.141:50010, dest: /192.168.149.141:42764, bytes: 451, op: HDFS_READ, cliID: DFSClient_attempt_1492073551248_0012_m_000001_2_785964301_1, offset: 0, srvID: f274418e-04b6-4109-9521-e3c384c21ad0, blockid: BP-219683118-192.168.149.138-1491539013447:blk_1073742751_1927, duration: 160511 

datanode logs
slave2: 2017-04-13 19:58:02,389 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.149.141:34576, dest: /192.168.149.142:50010, bytes: 127362723, op: HDFS_WRITE, cliID: DFSClient_attempt_1492073551248_0012_m_000000_0_-417808976_1, offset: 0, srvID: 7f9110ab-8a1d-4a32-8219-aff6e3cd29b2, blockid: BP-219683118-192.168.149.138-1491539013447:blk_1073742761_1937, duration: 64254909353
2017-04-13 19:58:02,389 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-219683118-192.168.149.138-1491539013447:blk_1073742761_1937, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2017-04-13 19:58:11,269 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.149.141:34588, dest: /192.168.149.142:50010, bytes: 134217728, op: HDFS_WRITE, cliID: DFSClient_attempt_1492073551248_0012_m_000002_1_-2031862368_1, offset: 0, srvID: 7f9110ab-8a1d-4a32-8219-aff6e3cd29b2, blockid: BP-219683118-192.168.149.138-1491539013447:blk_1073742762_1938, duration: 63824306914
2017-04-13 19:58:11,270 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-219683118-192.168.149.138-1491539013447:blk_1073742762_1938, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2017-04-13 19:58:15,441 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:349ms (threshold=300ms)
2017-04-13 19:58:15,769 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:328ms (threshold=300ms)
2017-04-13 19:58:28,675 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.149.142:51700, dest: /192.168.149.142:50010, bytes: 134217728, op: HDFS_WRITE, cliID: DFSClient_attempt_1492073551248_0012_m_000003_1_-395038848_1, offset: 0, srvID: 7f9110ab-8a1d-4a32-8219-aff6e3cd29b2, blockid: BP-219683118-192.168.149.138-1491539013447:blk_1073742763_1939, duration: 52247885321
2017-04-13 19:58:28,675 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-219683118-192.168.149.138-1491539013447:blk_1073742763_1939, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2017-04-13 19:58:28,689 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-219683118-192.168.149.138-1491539013447:blk_1073742764_1940 src: /192.168.149.142:51718 dest: /192.168.149.142:50010   

有解决此错误的想法吗?谢谢!

【问题讨论】:

数据节点上是否有足够的空间? 一个datanodes DFS Used 10.45% 另一个4.12% 另外,两个datanode都正常工作。 可以分享datanode的日志吗? 好的,我把它们贴在上面了,你可以看看。 【参考方案1】:

检查https://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo, 我在短时间内执行大量查询时遇到了同样的错误,因此增加了datanode上的线程并通过设置“dfs.datanode.handler.count”解决了这个问题。引发此异常的原因有多种,请访问链接以查看是否提到了您的案例。

【讨论】:

以上是关于只能复制到 0 个节点而不是 minReplication (=1)。有 2 个数据节点正在运行,并且在此操作中不排除任何节点的主要内容,如果未能解决你的问题,请参考以下文章

Apache Spark Ec2:只能复制到 0 个节点,而不是 1 个

只能复制到 0 个节点而不是 minReplication (=1)。有 2 个数据节点正在运行,并且在此操作中不排除任何节点

Mnesia 表复制/共享

redis 主从复制

SLURM sbatch 是不是会自动跨节点复制用户脚本?

行被复制而不是移动