(005)Hadoop基础之Compression

Posted sirlijun

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了(005)Hadoop基础之Compression相关的知识,希望对你有一定的参考价值。

一:压缩(cpmpression)好处

  • 压缩的好处,减少存储文件所需的空间,并且加速数据在网络与磁盘上的传输。

二:压缩格式总结

  • Hadoop支持DEFLATE、Gzip、bzip2三种压缩格式,还有LZO,LZ4,Snappy压缩,但是需要自己去官网下载相应的包
  • 压缩格式的压缩性能比较,通过笔者在windows与Linux不同服务器做出不同的答案。综合来说两种情况,如果需要分割,大多数bzip2,压缩效果也是很好的,但是解压和压缩的时间就比较长。

三:核心代码演示

  • 代码演示
package com.lj.CompressFileDemo;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.compress.*;
import org.apache.hadoop.util.ReflectionUtils;
import org.apache.log4j.BasicConfigurator;
import sun.reflect.misc.ReflectUtil;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;

public class CompressDemo {

    public static void main(String[] args) throws Exception {
        BasicConfigurator.configure();
        Class[] clazz = {DeflateCodec.class, GzipCodec.class, BZip2Codec.class, Lz4Codec.class};
        for (Class cc : clazz) {
            //压缩
            zip(cc);
        }

        for (Class cc : clazz) {
            //解压
            unzip(cc);
        }

    }


    public static void zip(Class clazz) throws Exception {

        Configuration conf = new Configuration();
        CompressionCodec codec = (CompressionCodec) ReflectionUtils.newInstance(clazz, conf);
        //得到压缩流
        FileOutputStream fos = new FileOutputStream("D://Tools//TestDemo//Compress" + codec.getDefaultExtension());
        CompressionOutputStream zipcCos = codec.createOutputStream(fos);
        IOUtils.copyBytes(new FileInputStream("D://Tools//TestDemo//test01.txt"), zipcCos, 1024);
        zipcCos.close();
    }

    public static void unzip(Class clazz) throws Exception {
        Configuration conf = new Configuration();
        CompressionCodec codec = (CompressionCodec) ReflectionUtils.newInstance(clazz, conf);
        FileInputStream fis = new FileInputStream("D://Tools//TestDemo//Compress" + codec.getDefaultExtension());
        CompressionInputStream zipcCis = codec.createInputStream(fis);
        IOUtils.copyBytes(zipcCis, new FileOutputStream("D://Tools//TestDemo//Compress" + codec.getDefaultExtension() + ".txt"), 1024);
        zipcCis.close();
    }

}

  • 结果展示
0 [main] DEBUG org.apache.hadoop.util.NativeCodeLoader  - Trying to load the custom-built native-hadoop library...
0 [main] DEBUG org.apache.hadoop.util.NativeCodeLoader  - Loaded the native-hadoop library
0 [main] INFO org.apache.hadoop.io.compress.zlib.ZlibFactory  - Successfully loaded & initialized native-zlib library
219 [main] INFO org.apache.hadoop.io.compress.CodecPool  - Got brand-new compressor [.deflate]
297 [main] INFO org.apache.hadoop.io.compress.CodecPool  - Got brand-new compressor [.gz]
359 [main] WARN org.apache.hadoop.io.compress.bzip2.Bzip2Factory  - Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
359 [main] INFO org.apache.hadoop.io.compress.CodecPool  - Got brand-new compressor [.bz2]
4954 [main] INFO org.apache.hadoop.io.compress.CodecPool  - Got brand-new compressor [.lz4]
4985 [main] INFO org.apache.hadoop.io.compress.CodecPool  - Got brand-new decompressor [.deflate]
5016 [main] INFO org.apache.hadoop.io.compress.CodecPool  - Got brand-new decompressor [.gz]
5047 [main] INFO org.apache.hadoop.io.compress.CodecPool  - Got brand-new decompressor [.bz2]
5297 [main] INFO org.apache.hadoop.io.compress.CodecPool  - Got brand-new decompressor [.lz4]

 




以上是关于(005)Hadoop基础之Compression的主要内容,如果未能解决你的问题,请参考以下文章

C#基础之005 常用运算符 1

大数据之Hadoop(MapReduce):压缩位置选择和压缩参数配置

hadoop之hadoop基础介绍

Hadoop基础之《—Hadoop概述》

《Hadoop基础教程》之初识Hadoop

《Hadoop基础教程》之初识Hadoop(转载)