Kinesis Firehose-什么是S3扩展目标配置?
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Kinesis Firehose-什么是S3扩展目标配置?相关的知识,希望对你有一定的参考价值。
问题
S3扩展目标配置是什么,AWS文档中的何处清楚地说明了它的用途?
顾名思义,它必须是关于S3目的地的。但是,AWS文档的S3目标部分没有提及。
如果有清楚说明的文章或博客,请提供指示。
我一直在下面的文档中寻找线索,但是与AWS文档一样,目前尚不清楚。它看起来与输入记录转换或记录处理部分相关。
Amazon Kinesis Data Firehose API Reference - ExtendedS3DestinationConfiguration
描述Amazon S3中目标的配置。
Amazon Kinesis Data Firehose Developer Guide PDF-转换输入记录格式(API)
[如果您想让Kinesis Data Firehose从JSON转换输入数据的格式,到Parquet或ORC,请在中指定可选的DataFormatConversionConfiguration元素ExtendedS3DestinationConfiguration ...
AWS CloudFormation - AWS::KinesisFirehose::DeliveryStream ExtendedS3DestinationConfiguration
[ExtendedS3DestinationConfiguration属性类型为Amazon Kinesis Data Firehose交付流配置Amazon S3目标。
resource "aws_kinesis_firehose_delivery_stream" "extended_s3_stream" {
name = "terraform-kinesis-firehose-extended-s3-test-stream"
destination = "extended_s3"
extended_s3_configuration {
role_arn = "${aws_iam_role.firehose_role.arn}"
bucket_arn = "${aws_s3_bucket.bucket.arn}"
processing_configuration {
enabled = "true"
processors {
type = "Lambda"
parameters {
parameter_name = "LambdaArn"
parameter_value = "${aws_lambda_function.lambda_processor.arn}:$LATEST"
}
}
}
}
}
确切地说,扩展的s3配置是什么以及如何定义在API documentation中显示:
{
"RoleARN": "string",
"BucketARN": "string",
"Prefix": "string",
"ErrorOutputPrefix": "string",
"BufferingHints": {
"SizeInMBs": integer,
"IntervalInSeconds": integer
},
"CompressionFormat": "UNCOMPRESSED"|"GZIP"|"ZIP"|"Snappy",
"EncryptionConfiguration": {
"NoEncryptionConfig": "NoEncryption",
"KMSEncryptionConfig": {
"AWSKMSKeyARN": "string"
}
},
"CloudWatchLoggingOptions": {
"Enabled": true|false,
"LogGroupName": "string",
"LogStreamName": "string"
},
"ProcessingConfiguration": {
"Enabled": true|false,
"Processors": [
{
"Type": "Lambda",
"Parameters": [
{
"ParameterName": "LambdaArn"|"NumberOfRetries"|"RoleArn"|"BufferSizeInMBs"|"BufferIntervalInSeconds",
"ParameterValue": "string"
}
...
]
}
...
]
},
"S3BackupMode": "Disabled"|"Enabled",
"S3BackupUpdate": {
"RoleARN": "string",
"BucketARN": "string",
"Prefix": "string",
"ErrorOutputPrefix": "string",
"BufferingHints": {
"SizeInMBs": integer,
"IntervalInSeconds": integer
},
"CompressionFormat": "UNCOMPRESSED"|"GZIP"|"ZIP"|"Snappy",
"EncryptionConfiguration": {
"NoEncryptionConfig": "NoEncryption",
"KMSEncryptionConfig": {
"AWSKMSKeyARN": "string"
}
},
"CloudWatchLoggingOptions": {
"Enabled": true|false,
"LogGroupName": "string",
"LogStreamName": "string"
}
},
"DataFormatConversionConfiguration": {
"SchemaConfiguration": {
"RoleARN": "string",
"CatalogId": "string",
"DatabaseName": "string",
"TableName": "string",
"Region": "string",
"VersionId": "string"
},
"InputFormatConfiguration": {
"Deserializer": {
"OpenXJsonSerDe": {
"ConvertDotsInJsonKeysToUnderscores": true|false,
"CaseInsensitive": true|false,
"ColumnToJsonKeyMappings": {"string": "string"
...}
},
"HiveJsonSerDe": {
"TimestampFormats": ["string", ...]
}
}
},
"OutputFormatConfiguration": {
"Serializer": {
"ParquetSerDe": {
"BlockSizeBytes": integer,
"PageSizeBytes": integer,
"Compression": "UNCOMPRESSED"|"GZIP"|"SNAPPY",
"EnableDictionaryCompression": true|false,
"MaxPaddingBytes": integer,
"WriterVersion": "V1"|"V2"
},
"OrcSerDe": {
"StripeSizeBytes": integer,
"BlockSizeBytes": integer,
"RowIndexStride": integer,
"EnablePadding": true|false,
"PaddingTolerance": double,
"Compression": "NONE"|"ZLIB"|"SNAPPY",
"BloomFilterColumns": ["string", ...],
"BloomFilterFalsePositiveProbability": double,
"DictionaryKeyThreshold": double,
"FormatVersion": "V0_11"|"V0_12"
}
}
},
"Enabled": true|false
}
}
[恐怕Kinesis Firehose文档的编写如此糟糕,我不知道人们怎么能仅从文档中找出如何使用Firehose。
[最初看起来,firehose仅将数据中继到S3存储桶,并且没有内置的转换机制,并且S3目标配置没有AWS::KinesisFirehose::DeliveryStream S3DestinationConfiguration中的处理配置。
然后,在Amazon Kinesis Firehose Data Transformation with AWS Lambda中,似乎在2017年初引入了一种转换记录的机制,因此添加了AWS::KinesisFirehose::DeliveryStream ExtendedS3DestinationConfiguration。
显然,人们在努力寻找如何配置的方法:
- Does Amazon Kinesis Firehose support Data Transformations programatically?
好吧,经过大量的努力和文档的整理,我终于明白了。
谁可以通过阅读AWS文档来弄清楚?
用于lambda转换的Firehose扩展S3配置
无法从AWS文档中找到,但是在查看了Internet中的实际实现之后,它看起来需要下面的配置。
ExtendedS3DestinationConfiguration属性类型的第一个为Amazon Kinesis Data Firehose交付流配置Amazon S3目标。看到:https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-kinesisfirehose-deliverystream-extendeds3destinationconfiguration.html
谢谢。
以上是关于Kinesis Firehose-什么是S3扩展目标配置?的主要内容,如果未能解决你的问题,请参考以下文章
有没有办法为 kinesis firehose 保存到 s3 的文件指定文件扩展名
按事件时间对 Kinesis firehose S3 记录进行分区
使用 AWS kinesis-firehose 将数据写入文件
读取 Amazon Kinesis Firehose 流写入 s3 的数据