(005)Hadoop基础之Compression
Posted sirlijun
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了(005)Hadoop基础之Compression相关的知识,希望对你有一定的参考价值。
一:压缩(cpmpression)好处
- 压缩的好处,减少存储文件所需的空间,并且加速数据在网络与磁盘上的传输。
二:压缩格式总结
- Hadoop支持DEFLATE、Gzip、bzip2三种压缩格式,还有LZO,LZ4,Snappy压缩,但是需要自己去官网下载相应的包
- 压缩格式的压缩性能比较,通过笔者在windows与Linux不同服务器做出不同的答案。综合来说两种情况,如果需要分割,大多数bzip2,压缩效果也是很好的,但是解压和压缩的时间就比较长。
三:核心代码演示
- 代码演示
package com.lj.CompressFileDemo; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.io.compress.*; import org.apache.hadoop.util.ReflectionUtils; import org.apache.log4j.BasicConfigurator; import sun.reflect.misc.ReflectUtil; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.FileOutputStream; public class CompressDemo { public static void main(String[] args) throws Exception { BasicConfigurator.configure(); Class[] clazz = {DeflateCodec.class, GzipCodec.class, BZip2Codec.class, Lz4Codec.class}; for (Class cc : clazz) { //压缩 zip(cc); } for (Class cc : clazz) { //解压 unzip(cc); } } public static void zip(Class clazz) throws Exception { Configuration conf = new Configuration(); CompressionCodec codec = (CompressionCodec) ReflectionUtils.newInstance(clazz, conf); //得到压缩流 FileOutputStream fos = new FileOutputStream("D://Tools//TestDemo//Compress" + codec.getDefaultExtension()); CompressionOutputStream zipcCos = codec.createOutputStream(fos); IOUtils.copyBytes(new FileInputStream("D://Tools//TestDemo//test01.txt"), zipcCos, 1024); zipcCos.close(); } public static void unzip(Class clazz) throws Exception { Configuration conf = new Configuration(); CompressionCodec codec = (CompressionCodec) ReflectionUtils.newInstance(clazz, conf); FileInputStream fis = new FileInputStream("D://Tools//TestDemo//Compress" + codec.getDefaultExtension()); CompressionInputStream zipcCis = codec.createInputStream(fis); IOUtils.copyBytes(zipcCis, new FileOutputStream("D://Tools//TestDemo//Compress" + codec.getDefaultExtension() + ".txt"), 1024); zipcCis.close(); } }
- 结果展示
0 [main] DEBUG org.apache.hadoop.util.NativeCodeLoader - Trying to load the custom-built native-hadoop library... 0 [main] DEBUG org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library 0 [main] INFO org.apache.hadoop.io.compress.zlib.ZlibFactory - Successfully loaded & initialized native-zlib library 219 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new compressor [.deflate] 297 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new compressor [.gz] 359 [main] WARN org.apache.hadoop.io.compress.bzip2.Bzip2Factory - Failed to load/initialize native-bzip2 library system-native, will use pure-Java version 359 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new compressor [.bz2] 4954 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new compressor [.lz4] 4985 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor [.deflate] 5016 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor [.gz] 5047 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor [.bz2] 5297 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor [.lz4]
以上是关于(005)Hadoop基础之Compression的主要内容,如果未能解决你的问题,请参考以下文章