大数据之Hadoop(MapReduce):Reduce输出端采用压缩

Posted 浊酒南街

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了大数据之Hadoop(MapReduce):Reduce输出端采用压缩相关的知识,希望对你有一定的参考价值。


基于WordCount案例处理。

1.修改驱动

package com.jinghang.mapreduce.compress;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.compress.BZip2Codec;
import org.apache.hadoop.io.compress.DefaultCodec;
import org.apache.hadoop.io.compress.GzipCodec;
import org.apache.hadoop.io.compress.Lz4Codec;
import org.apache.hadoop.io.compress.SnappyCodec;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCountDriver {

	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		
		Configuration configuration = new Configuration();
		
		Job job = Job.getInstance(configuration);
		
		job.setJarByClass(WordCountDriver.class);
		
		job.setMapperClass(WordCountMapper.class);
		job.setReducerClass(WordCountReducer.class);
		
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(IntWritable.class);
		
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		
		FileInputFormat.setInputPaths(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		
		// 设置reduce端输出压缩开启
		FileOutputFormat.setCompressOutput(job, true);
		
		// 设置压缩的方式
	    FileOutputFormat.setOutputCompressorClass(job, BZip2Codec.class); 
//	    FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class); 
//	    FileOutputFormat.setOutputCompressorClass(job, DefaultCodec.class); 
	    
		boolean result = job.waitForCompletion(true);
		
		System.exit(result?1:0);
	}
}

2.Mapper和Reducer保持不变

以上是关于大数据之Hadoop(MapReduce):Reduce输出端采用压缩的主要内容,如果未能解决你的问题,请参考以下文章

大数据之Hadoop(MapReduce): MapReduce概述

大数据之Hadoop(MapReduce):MapReduce核心思想

大数据之Hadoop(MapReduce):Hadoop企业优化

大数据技术之Hadoop(MapReduce)概述序列化

大数据技术之Hadoop(MapReduce)概述序列化

大数据之Hadoop(MapReduce):MapReduce编程规范