Flink 1.7.1 无法使用 core-site.xml 对 s3a 进行身份验证

Posted

技术标签:

【中文标题】Flink 1.7.1 无法使用 core-site.xml 对 s3a 进行身份验证【英文标题】:Flink 1.7.1 fails to authenticate s3a with core-site.xml 【发布时间】:2019-01-25 10:58:12 【问题描述】:

使用 Flink 1.7.1 为 kubernetes 上的单个作业集群构建它 flink 无法加载核心站点 xml,尽管位于类路径上,导致忽略配置,但是,如果我放置 ENV 变量 AWS_SECRET_ACCESS_KEY AWS_ACCESS_KEY_ID 它可以通过找到它,但如果我依赖 core-site.xml,它永远不会在没有 env 变量的情况下工作。

我目前正在复制 Dockerfile 中显示的 core-site.xml 以及文档中所说的将 HADOOP_CONF_DIR 作为指向它的 env 变量。它仍然没有加载它,导致 NoCredentialsProvider。

例外是:

Caused by: org.apache.flink.fs.s3base.shaded.com.amazonaws.AmazonClientException: No AWS Credentials provided by 
BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : org.apache.flink.fs.s3base.shaded.com.amazonaws.SdkClientException: 
Unable to load credentials from service endpoint

作业管理器/任务管理器加载的类路径

-  Classpath: 


/opt/flink-1.7.1/lib/aws-java-sdk-core-1.11.489.jar
:/opt/flink-1.7.1/lib/aws-java-sdk-kms-1.11.489.jar
:/opt/flink-1.7.1/lib/aws-java-sdk-s3-1.10.6.jar
:/opt/flink-1.7.1/lib/flink-python_2.12-1.7.1.jar
:/opt/flink-1.7.1/lib/flink-s3-fs-hadoop-1.7.1.jar
:/opt/flink-1.7.1/lib/flink-shaded-hadoop2-uber-1.7.1.jar
:/opt/flink-1.7.1/lib/hadoop-aws-2.8.0.jar:/opt/flink-1.7.1/lib/httpclient-4.5.6.jar
:/opt/flink-1.7.1/lib/httpcore-4.4.11.jar
:/opt/flink-1.7.1/lib/jackson-annotations-2.9.8.jar
:/opt/flink-1.7.1/lib/jackson-core-2.9.8.jar
:/opt/flink-1.7.1/lib/jackson-databind-2.9.8.jar
:/opt/flink-1.7.1/lib/job.jar
:/opt/flink-1.7.1/lib/joda-time-2.10.1.jar
:/opt/flink-1.7.1/lib/log4j-1.2.17.jar
:/opt/flink-1.7.1/lib/slf4j-log4j12-1.7.15.jar
:/opt/flink-1.7.1/lib/flink-dist_2.12-1.7.1.jar
:
:/hadoop/conf:

Dockerfile 构建 docker 镜像:

################################################################################
#  Licensed to the Apache Software Foundation (ASF) under one
#  or more contributor license agreements.  See the NOTICE file
#  distributed with this work for additional information
#  regarding copyright ownership.  The ASF licenses this file
#  to you under the Apache License, Version 2.0 (the
#  "License"); you may not use this file except in compliance
#  with the License.  You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
# limitations under the License.
################################################################################

FROM openjdk:8-jre-alpine

# Install requirements
# Modification to original Dockerfile to support rocksdb
# RUN apk add --no-cache bash snappy
# This is a fix for RocksDB compatibility


# Flink environment variables
ENV FLINK_INSTALL_PATH=/opt
ENV FLINK_HOME $FLINK_INSTALL_PATH/flink
ENV FLINK_LIB_DIR $FLINK_HOME/lib
ENV PATH $PATH:$FLINK_HOME/bin
ENV FLINK_CONF $FLINK_HOME/conf
ENV FLINK_OPT $FLINK_HOME/opt
ENV FLINK_HADOOP_CONF /hadoop/conf

# flink-dist can point to a directory or a tarball on the local system
ARG flink_dist=NOT_SET
ARG job_jar=NOT_SET

# Install build dependencies and flink
ADD $flink_dist $FLINK_INSTALL_PATH
ADD $job_jar $FLINK_INSTALL_PATH/job.jar

RUN set -x && \
  ln -s $FLINK_INSTALL_PATH/flink-* $FLINK_HOME && \
  ln -s $FLINK_INSTALL_PATH/job.jar $FLINK_LIB_DIR && \
  addgroup -S flink && adduser -D -S -H -G flink -h $FLINK_HOME flink && \
  chown -R flink:flink $FLINK_INSTALL_PATH/flink-* && \
  chown -h flink:flink $FLINK_HOME

# Modification to original Dockerfile
RUN apk add --no-cache bash libc6-compat snappy 'su-exec>=0.2'

COPY core-site.xml $FLINK_HADOOP_CONF/core-site.xml
ENV HADOOP_CONF_DIR=$FLINK_HADOOP_CONF

RUN echo "fs.hdfs.hadoopconf: $FLINK_HADOOP_CONF" >> $FLINK_CONF/flink-conf.yaml

RUN echo "akka.ask.timeout: 30 min" >> $FLINK_CONF/flink-conf.yaml

RUN echo "akka.client.timeout: 30 min" >> $FLINK_CONF/flink-conf.yaml
RUN echo "web.timeout: 180000" >> $FLINK_CONF/flink-conf.yaml

RUN mv $FLINK_OPT/flink-s3-fs-hadoop-1.7.1.jar $FLINK_LIB_DIR

COPY docker-entrypoint.sh /

RUN chmod +x docker-entrypoint.sh

RUN wget -O $FLINK_LIB_DIR/hadoop-aws-2.8.0.jar https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.8.0/hadoop-aws-2.8.0.jar
RUN wget -O $FLINK_LIB_DIR/aws-java-sdk-s3-1.10.6.jar http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-s3/1.10.6/aws-java-sdk-s3-1.10.6.jar
#Transitive Dependency of aws-java-sdk-s3
RUN wget -O $FLINK_LIB_DIR/aws-java-sdk-core-1.11.489.jar http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-core/1.11.489/aws-java-sdk-core-1.11.489.jar
RUN wget -O $FLINK_LIB_DIR/aws-java-sdk-kms-1.11.489.jar http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-kms/1.11.489/aws-java-sdk-kms-1.11.489.jar
RUN wget -O $FLINK_LIB_DIR/jackson-annotations-2.9.8.jar http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.9.8/jackson-annotations-2.9.8.jar
RUN wget -O $FLINK_LIB_DIR/jackson-core-2.9.8.jar http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-core/2.9.8/jackson-core-2.9.8.jar
RUN wget -O $FLINK_LIB_DIR/jackson-databind-2.9.8.jar http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.9.8/jackson-databind-2.9.8.jar
RUN wget -O $FLINK_LIB_DIR/joda-time-2.10.1.jar http://central.maven.org/maven2/joda-time/joda-time/2.10.1/joda-time-2.10.1.jar
RUN wget -O $FLINK_LIB_DIR/httpcore-4.4.11.jar http://central.maven.org/maven2/org/apache/httpcomponents/httpcore/4.4.11/httpcore-4.4.11.jar
RUN wget -O $FLINK_LIB_DIR/httpclient-4.5.6.jar http://central.maven.org/maven2/org/apache/httpcomponents/httpclient/4.5.6/httpclient-4.5.6.jar
#Modification to original Dockerfile

USER flink
EXPOSE 8081 6123
ENTRYPOINT ["/docker-entrypoint.sh"]
CMD ["--help"]

核心站点.xml

<configuration>

    <property>
        <name>fs.s3.impl</name>
        <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
    </property>

    <!-- Comma separated list of local directories used to buffer
         large results prior to transmitting them to S3. -->
    <property>
        <name>fs.s3a.buffer.dir</name>
        <value>/tmp</value>
    </property>

    <property>
        <name>fs.s3a.access.key</name>
        <description>AWS access key ID.
            Omit for IAM role-based or provider-based authentication.</description>
        <value>*</value>
    </property>

    <property>
        <name>fs.s3a.secret.key</name>
        <description>AWS secret key.
            Omit for IAM role-based or provider-based authentication.</description>
        <value>*</value>
    </property>


    <property>
        <name>fs.s3a.aws.credentials.provider</name>
        <value>org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider</value>
    </property>


</configuration>

【问题讨论】:

【参考方案1】:

好的,这已经解决了,如果你在类路径上有阴影 hadoop(将它从 /opt 移动到 /lib)你需要在 flink-conf 中指定你的键,但是现在我得到了以下异常

Caused by: java.io.IOException: org.apache.hadoop.fs.s3a.BasicAWSCredentialsProvider 
constructor exception.  A class specified in fs.s3a.aws.credentials.provider must provide a public constructor accepting URI and Configuration, or a public factory method named getInstance that accepts no arguments, or a public default constructor.

有什么想法吗?

【讨论】:

看起来 Flink 正在尝试加载标准凭据提供程序(获取那些 fs.s3a 密钥的提供程序)并且由于找不到或实例化它而失败。可能是类路径问题,否则整个阴影可能会造成混乱。 老实说,我无法让它找到 core-site.xml。几乎尝试了所有方法,我使用的是 /opt 文件夹提供的阴影 hadoop,只是将其移至 lib,但仍然无济于事。我猜在阴影 hadoop 中有一些默认行为忽略了提供的 core-site.xml。有更多内部代码知识的人需要这么说。【参考方案2】:

解决了,

将具有适当版本的 hadoop-common 添加到 Dockerfile 有帮助。

【讨论】:

以上是关于Flink 1.7.1 无法使用 core-site.xml 对 s3a 进行身份验证的主要内容,如果未能解决你的问题,请参考以下文章

Flink安装极简教程

Flink安装极简教程-单机版

namenode节点无法启动的解决方案:

Flink Java踩坑记录(gegge1.10.0)

Flink Java踩坑记录(gegge1.10.0)

Flink解决Flink在测试环境无法保存checkpoint问题