Set replication in Hadoop

Posted 大数据从业者

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Set replication in Hadoop相关的知识,希望对你有一定的参考价值。

I was trying loading file using hadoop API as an experiment.

I want to set replication to minimum as this one is for experiment. I first tried this with FileSystem.setReplication():

Configuration config = new Configuration();
config.set("fs.defaultFS","hdfs://192.168.248.166:8020");
FileSystem dfs2 = FileSystem.get(config);
Path src2 = new Path("C:\\Users\\abc\\Desktop\\testfile.txt");
Path dst2 = new Path(dfs2.getWorkingDirectory()+"/tempdir");
dfs2.copyFromLocalFile(src2, dst2);
dfs2.setReplication(dst2, (short)1);  /**setting replication**/

The replica was shown as 1, but it was available on 3 datanodes.

When I tried it with Configuration.set():

Configuration config = new Configuration();
config.set("fs.defaultFS","hdfs://192.168.248.166:8020");
config.set("dfs.replication", "1");  /**setting replication**/
FileSystem dfs2 = FileSystem.get(config);
Path src2 = new Path("C:\\Users\\abc\\Desktop\\testfile.txt");
Path dst2 = new Path(dfs2.getWorkingDirectory()+"/tempdir");

This gave the desired outcome (1 replica available on 1 datanode)

Why there are two APIs for the same thing? What is the difference between these two?

The difference is that Filesystem‘s setReplication() sets the replication of an existing file on HDFS. In your case, you first copy the local file testFile.txt to HDFS, using the default replication factor (3) and then change the replication factor of this file to 1. After this command, it takes a while until the over-replicated blocks get deleted. (source)

On the other hand, when you use the config.set("dfs.replication", "1"); command to set the replication, you can copy the local file after that, so its blocks get copied just once, from the first time.

In other words, I believe (but I might be wrong) that both commands have the same final result, but you have to wait a little bit until the first one is carried out.

 

以上是关于Set replication in Hadoop的主要内容,如果未能解决你的问题,请参考以下文章

Hadoop实战-Flume之Source replicating(十四)

大数据系列hadoop上传文件报错_COPYING_ could only be replicated to 0 nodes

Hadoop上传文件时报错: could only be replicated to 0 nodes instead of minReplication (=1)....

hadoop fs -put localfile . 时出现如下错误: could only be replicated to 0 nodes, instead of 1

kafkaThe replication factor in manual replication assignment is not equal to the existing replica(代码

Hadoop dfs 复制