Flink实战系列Flink使用StreamingFileSink写入HDFS（parquet格式snappy压缩）

Posted 2023-01-11 JasonLee实时计算

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Flink实战系列Flink使用StreamingFileSink写入HDFS（parquet格式snappy压缩）相关的知识，希望对你有一定的参考价值。

这篇文章主要介绍一下Flink使用StreamingFileSink写入HDFS怎么用snappy压缩，之前的文章介绍过了写入parquet格式的数据，当时也有星球里面的朋友问这种写法怎么压缩，我只是简单的回复了说可以用AvroParquetWriter,今天就来详细介绍一下具体怎么实现

我们首先来看一下AvroParquetWriter的源码

/** Create a new @link AvroParquetWriter.
 *
 * @param file a file path
 * @param avroSchema a schema for the write
 * @param compressionCodecName compression codec
 * @param blockSize target block size
 * @param pageSize target page size
 * @throws IOException if there is an error while writing
 */
@Deprecated
public AvroParquetWriter(Path file, Schema avroSchema,
    CompressionCodecName compressionCodecName, int blockSize,
    int pageSize) throws IOException 
  super(file, AvroParquetWriter.<T>writeSupport(avroSchema, SpecificData.get()),
     compressionCodecName, blockSize, pageSize);

可以看到AvroParquetWriter是支持压缩的第三个参数compressionCodecName就表示压缩

以上是关于Flink实战系列Flink使用StreamingFileSink写入HDFS（parquet格式snappy压缩）的主要内容，如果未能解决你的问题，请参考以下文章