Apache Spark Hadoop S3A SignatureDoesNotMatch

Posted

技术标签:

【中文标题】Apache Spark Hadoop S3A SignatureDoesNotMatch【英文标题】: 【发布时间】:2021-12-31 16:52:11 【问题描述】:

第一次使用 Spark 和 S3A。通过 hadoop S3A 访问时出现 SignatureDoesNotMatch 错误。但是,当我使用 AWS SDK 访问 S3 存储桶时,同样有效。以下是我的代码,我无法找出错误。我认为我的设置有问题。

public static void main(String[] args) 
        if (args.length < 1) 
            log.error("Usage: <app_name> <grok_pattern>");
            System.exit(0);
        

        final String appName = args[0];
        final SyslogParser syslogParser = new SyslogParser(args[1]);
        final JavaSparkContext sparkContext = initSparkContext(appName);
        final String queueName = Optional.ofNullable(System.getenv("SQS_QUEUE_NAME"))
                .orElseThrow(() -> new RuntimeException("SQS_QUEUE_NAME required."));
        log.info("...started streaming ", queueName);
        final ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest(getQueueUrl(queueName))
                .withWaitTimeSeconds(10)
                .withMaxNumberOfMessages(10);
        final ReceiveMessageResult receiveMessageResult = AMAZON_SQS.receiveMessage(receiveMessageRequest);
        log.info("...received new messages: ", receiveMessageResult.getMessages().size());
        receiveMessageResult.getMessages()
                .stream()
                .map(Message::getBody)
                .map(Application::toJsonNode)
                .filter(Objects::nonNull)
                .map(Application::toS3Pair)
                .forEach(s3Pair -> processS3Object(s3Pair, syslogParser, sparkContext));
        sparkContext.close();
    

    static SparkConf initSparkConf(String appName) 
        final SparkConf conf = new SparkConf();
        conf.setAppName(appName);
        conf.setMaster("local[*]");

        final EnvironmentVariableCredentialsProvider provider = new EnvironmentVariableCredentialsProvider();
        conf.set("spark.hadoop.fs.s3a.access.key", provider.getCredentials().getAWSAccessKeyId());
        conf.set("spark.hadoop.fs.s3a.secret.key", provider.getCredentials().getAWSSecretKey());
        // we are behind proxy hence set fs.s3a.proxy.host and port as well.
        return conf;
    

    static JavaSparkContext initSparkContext(String appName) 
        final SparkConf conf = initSparkConf(appName);
        final JavaSparkContext context = new JavaSparkContext(conf);
        context.setLogLevel(LogLevel.DEBUG.name());
        return context;
    

    static void processS3Object(final Pair<String, String> s3Pair, final SyslogParser syslogParser, final JavaSparkContext sparkContext) 
        final String outS3aUrl = "s3a://" + s3Pair.getLeft() + "/out/" + s3Pair.getRight();
        final String inS3aUrl = "s3a://" + s3Pair.getLeft() + "/" + s3Pair.getRight();
        sparkContext.textFile(inS3aUrl)
                .saveAsTextFile(outS3aUrl);

    

在调试日志之后,它显示签名不匹配,但是我确实注意到发送到 AWS 的日期与我的系统日期不同。不知道是不是这个问题,如果是不知道如何改变它。

21/11/22 12:09:38 DEBUG AWS4Signer: AWS4 Canonical Request: '"GET
/asdasd/asdasd.json

amz-sdk-invocation-id:b14a016a-a589-b314-536a-a19ad9e3a65c
amz-sdk-request:attempt=14;max=21
amz-sdk-retry:13/19092/372
content-type:application/octet-stream
host:temp-bucket.s3.ap-southeast-2.amazonaws.com
if-match:1f0aa30fff75c3b01269bf3a7e7ad241
range:bytes=0-284846
user-agent:Hadoop 3.3.1, aws-sdk-java/1.12.112 Linux/5.4.0-90-generic OpenJDK_64-Bit_Server_VM/25.292-b10 java/1.8.0_292 scala/2.13.5 vendor/Private_Build cfg/retry-mode/legacy
x-amz-content-sha256:UNSIGNED-PAYLOAD
x-amz-date:20211122T010938Z

amz-sdk-invocation-id;amz-sdk-request;amz-sdk-retry;content-type;host;if-match;range;user-agent;x-amz-content-sha256;x-amz-date
UNSIGNED-PAYLOAD"
21/11/22 12:09:38 DEBUG AWS4Signer: AWS4 String to Sign: '"AWS4-HMAC-SHA256
20211122T010938Z
20211122/ap-southeast-2/s3/aws4_request
8a5536b64079a96c3bbb61492d7fb2232f83b0546dcb91cbca8125a182927813"
21/11/22 12:09:38 DEBUG RequestAddCookies: CookieSpec selected: default
21/11/22 12:09:38 DEBUG RequestAuthCache: Auth cache not set in the context
21/11/22 12:09:38 DEBUG PoolingHttpClientConnectionManager: Connection request: [route: tls->http://proxy:3228->https://temp-buckets.s3.ap-southeast-2.amazonaws.com:443][total available: 2; route allocated: 1 of 48; total allocated: 2 of 48]
21/11/22 12:09:38 DEBUG wire: http-outgoing-7 << "end of stream"
21/11/22 12:09:38 DEBUG DefaultManagedHttpClientConnection: http-outgoing-7: Close connection
21/11/22 12:09:38 DEBUG PoolingHttpClientConnectionManager: Connection leased: [id: 8][route: tls->http://proxy:3128->https://temp-buckets.s3.ap-southeast-2.amazonaws.com:443][total available: 1; route allocated: 1 of 48; total allocated: 2 of 48]
21/11/22 12:09:38 DEBUG MainClientExec: Opening connection tls->http://proxy:3128->https://temp-buckets.s3.ap-southeast-2.amazonaws.com:443
21/11/22 12:09:38 DEBUG DefaultHttpClientConnectionOperator: Connecting to proxy/10.1.1.1:3128
21/11/22 12:09:38 DEBUG DefaultHttpClientConnectionOperator: Connection established 10.1.1.2:38250<->10.1.1.1:3128
21/11/22 12:09:38 DEBUG headers: http-outgoing-8 >> CONNECT temp-buckets.s3.ap-southeast-2.amazonaws.com:443 HTTP/1.1
21/11/22 12:09:38 DEBUG headers: http-outgoing-8 >> Host: temp-buckets.s3.ap-southeast-2.amazonaws.com
21/11/22 12:09:38 DEBUG headers: http-outgoing-8 >> User-Agent: Apache-HttpClient/4.5.13 (Java/1.8.0_292)
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "CONNECT temp-buckets.s3.ap-southeast-2.amazonaws.com:443 HTTP/1.1[\r][\n]"
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "Host: temp-buckets.s3.ap-southeast-2.amazonaws.com[\r][\n]"
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "User-Agent: Apache-HttpClient/4.5.13 (Java/1.8.0_292)[\r][\n]"
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "[\r][\n]"

21/11/22 12:09:38 DEBUG wire: http-outgoing-8 << "HTTP/1.1 200 Connection established[\r][\n]"
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 << "[\r][\n]"
21/11/22 12:09:38 DEBUG headers: http-outgoing-8 << HTTP/1.1 200 Connection established
21/11/22 12:09:38 DEBUG MainClientExec: Tunnel to target created.
21/11/22 12:09:38 DEBUG SdkTLSSocketFactory: Enabled protocols: [TLSv1.2]
21/11/22 12:09:38 DEBUG SdkTLSSocketFactory: Enabled cipher suites:[TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, TLS_RSA_WITH_AES_256_GCM_SHA384, TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384, TLS_DHE_RSA_WITH_AES_256_GCM_SHA384, TLS_DHE_DSS_WITH_AES_256_GCM_SHA384, TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, TLS_RSA_WITH_AES_128_GCM_SHA256, TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256, TLS_DHE_RSA_WITH_AES_128_GCM_SHA256, TLS_DHE_DSS_WITH_AES_128_GCM_SHA256, TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384, TLS_RSA_WITH_AES_256_CBC_SHA256, TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384, TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384, TLS_DHE_RSA_WITH_AES_256_CBC_SHA256, TLS_DHE_DSS_WITH_AES_256_CBC_SHA256, TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA, TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA, TLS_ECDH_RSA_WITH_AES_256_CBC_SHA, TLS_DHE_RSA_WITH_AES_256_CBC_SHA, TLS_DHE_DSS_WITH_AES_256_CBC_SHA, TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256, TLS_RSA_WITH_AES_128_CBC_SHA256, TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256, TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256, TLS_DHE_RSA_WITH_AES_128_CBC_SHA256, TLS_DHE_DSS_WITH_AES_128_CBC_SHA256, TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_128_CBC_SHA, TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA, TLS_ECDH_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_DSS_WITH_AES_128_CBC_SHA, TLS_EMPTY_RENEGOTIATION_INFO_SCSV]
21/11/22 12:09:38 DEBUG SdkTLSSocketFactory: socket.getSupportedProtocols(): [TLSv1.3, TLSv1.2, TLSv1.1, TLSv1, SSLv3, SSLv2Hello], socket.getEnabledProtocols(): [TLSv1.2]
21/11/22 12:09:38 DEBUG SdkTLSSocketFactory: TLS protocol enabled for SSL handshake: [TLSv1.2, TLSv1.1, TLSv1]
21/11/22 12:09:38 DEBUG SdkTLSSocketFactory: Starting handshake
21/11/22 12:09:38 DEBUG SdkTLSSocketFactory: Secure session established
21/11/22 12:09:38 DEBUG SdkTLSSocketFactory:  negotiated protocol: TLSv1.2
21/11/22 12:09:38 DEBUG SdkTLSSocketFactory:  negotiated cipher suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
21/11/22 12:09:38 DEBUG SdkTLSSocketFactory:  peer principal: CN=*.s3-ap-southeast-2.amazonaws.com
21/11/22 12:09:38 DEBUG SdkTLSSocketFactory:  peer alternative names: [s3-ap-southeast-2.amazonaws.com, *.s3-ap-southeast-2.amazonaws.com, s3.ap-southeast-2.amazonaws.com, *.s3.ap-southeast-2.amazonaws.com, s3.dualstack.ap-southeast-2.amazonaws.com, *.s3.dualstack.ap-southeast-2.amazonaws.com, *.s3.amazonaws.com, *.s3-control.ap-southeast-2.amazonaws.com, s3-control.ap-southeast-2.amazonaws.com, *.s3-control.dualstack.ap-southeast-2.amazonaws.com, s3-control.dualstack.ap-southeast-2.amazonaws.com, *.s3-accesspoint.ap-southeast-2.amazonaws.com, *.s3-accesspoint.dualstack.ap-southeast-2.amazonaws.com, *.s3.ap-southeast-2.vpce.amazonaws.com]
21/11/22 12:09:38 DEBUG SdkTLSSocketFactory:  issuer principal: CN=pan-d84ofw002-CA
21/11/22 12:09:38 DEBUG DefaultManagedHttpClientConnection: http-outgoing-8: set socket timeout to 200000
21/11/22 12:09:38 DEBUG MainClientExec: Executing request GET /folder/filename.json HTTP/1.1
21/11/22 12:09:38 DEBUG headers: http-outgoing-8 >> GET /folder/filename.json HTTP/1.1
21/11/22 12:09:38 DEBUG headers: http-outgoing-8 >> Host: temp-buckets.s3.ap-southeast-2.amazonaws.com
21/11/22 12:09:38 DEBUG headers: http-outgoing-8 >> amz-sdk-invocation-id: b14a016a-a589-b314-536a-a19ad9e3a65c
21/11/22 12:09:38 DEBUG headers: http-outgoing-8 >> amz-sdk-request: attempt=14;max=21
21/11/22 12:09:38 DEBUG headers: http-outgoing-8 >> amz-sdk-retry: 13/19092/372
21/11/22 12:09:38 DEBUG headers: http-outgoing-8 >> Authorization: AWS4-HMAC-SHA256 Credential=AWSACCESSKEY/20211122/ap-southeast-2/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;amz-sdk-retry;content-type;host;if-match;range;user-agent;x-amz-content-sha256;x-amz-date, Signature=72cd72ef9948643604f7ccd460f29cdfa912f1fdde0faa913f84a4425dd43
21/11/22 12:09:38 DEBUG headers: http-outgoing-8 >> Content-Type: application/octet-stream
21/11/22 12:09:38 DEBUG headers: http-outgoing-8 >> If-Match: 1f0aa30fff75c3b01269bf3a7e7ad241
21/11/22 12:09:38 DEBUG headers: http-outgoing-8 >> Range: bytes=0-284846
21/11/22 12:09:38 DEBUG headers: http-outgoing-8 >> User-Agent: Hadoop 3.3.1, aws-sdk-java/1.12.112 Linux/5.4.0-90-generic OpenJDK_64-Bit_Server_VM/25.292-b10 java/1.8.0_292 scala/2.13.5 vendor/Private_Build cfg/retry-mode/legacy
21/11/22 12:09:38 DEBUG headers: http-outgoing-8 >> x-amz-content-sha256: UNSIGNED-PAYLOAD
21/11/22 12:09:38 DEBUG headers: http-outgoing-8 >> X-Amz-Date: 20211122T010938Z
21/11/22 12:09:38 DEBUG headers: http-outgoing-8 >> Content-Length: 0
21/11/22 12:09:38 DEBUG headers: http-outgoing-8 >> Connection: Keep-Alive
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "GET /folder/filename.json HTTP/1.1[\r][\n]"
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "Host: temp-buckets.s3.ap-southeast-2.amazonaws.com[\r][\n]"
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "amz-sdk-invocation-id: b14a016a-a589-b314-536a-a19ad9e3a65c[\r][\n]"
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "amz-sdk-request: attempt=14;max=21[\r][\n]"
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "amz-sdk-retry: 13/19092/372[\r][\n]"
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "Authorization: AWS4-HMAC-SHA256 Credential=AWSACCESSKEY/20211122/ap-southeast-2/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;amz-sdk-retry;content-type;host;if-match;range;user-agent;x-amz-content-sha256;x-amz-date, Signature=72cd72ef9948643604f7ccd460f29cdfa912f1fdde0faa913f84a4425dd43[\r][\n]"
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "Content-Type: application/octet-stream[\r][\n]"
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "If-Match: 1f0aa30fff75c3b01269bf3a7e7ad241[\r][\n]"
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "Range: bytes=0-284846[\r][\n]"
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "User-Agent: Hadoop 3.3.1, aws-sdk-java/1.12.112 Linux/5.4.0-90-generic OpenJDK_64-Bit_Server_VM/25.292-b10 java/1.8.0_292 scala/2.13.5 vendor/Private_Build cfg/retry-mode/legacy[\r][\n]"
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "x-amz-content-sha256: UNSIGNED-PAYLOAD[\r][\n]"
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "X-Amz-Date: 20211122T010938Z[\r][\n]"
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "Content-Length: 0[\r][\n]"
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "Connection: Keep-Alive[\r][\n]"
21/11/22 12:09:38 DEBUG wire: http-outgoing-8 >> "[\r][\n]"

21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "HTTP/1.1 403 Forbidden[\r][\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "x-amz-request-id: 03D8MX920R4B7Q3Y[\r][\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "x-amz-id-2: PIOL/VOfyuExRB2FLPovEO104N66SQfe+fx3nCLlD5k51KsOe7m3un6LJUO+9UNCsWEMB/ydGeo=[\r][\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "Date: Mon, 22 Nov 2021 01:09:38 GMT[\r][\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "Server: AmazonS3[\r][\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "Transfer-Encoding: chunked[\r][\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "Content-Type: application/xml[\r][\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "Connection: keep-alive[\r][\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "[\r][\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "1036[\r][\n]"
21/11/22 12:09:39 DEBUG headers: http-outgoing-8 << HTTP/1.1 403 Forbidden
21/11/22 12:09:39 DEBUG headers: http-outgoing-8 << x-amz-request-id: 03D8MX920R4B7Q3Y
21/11/22 12:09:39 DEBUG headers: http-outgoing-8 << x-amz-id-2: PIOL/VOfyuExRB2FLPovEO104N66SQfe+fx3nCLlD5k51KsOe7m3un6LJUO+9UNCsWEMB/ydGeo=
21/11/22 12:09:39 DEBUG headers: http-outgoing-8 << Date: Mon, 22 Nov 2021 01:09:38 GMT
21/11/22 12:09:39 DEBUG headers: http-outgoing-8 << Server: AmazonS3
21/11/22 12:09:39 DEBUG headers: http-outgoing-8 << Transfer-Encoding: chunked
21/11/22 12:09:39 DEBUG headers: http-outgoing-8 << Content-Type: application/xml
21/11/22 12:09:39 DEBUG headers: http-outgoing-8 << Connection: keep-alive
21/11/22 12:09:39 DEBUG MainClientExec: Connection can be kept alive for 60000 MILLISECONDS
21/11/22 12:09:39 DEBUG ClockSkewAdjuster: Reported server date (from 'Date' header): Mon, 22 Nov 2021 01:09:38 GMT
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "<?xml version="1.0" encoding="UTF-8"?>[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "<Error><Code>SignatureDoesNotMatch</Code><Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message><AWSAccessKeyId>AWSACCESSKEY</AWSAccessKeyId><StringToSign>AWS4-HMAC-SHA256[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "20211122T010938Z[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "20211122/ap-southeast-2/s3/aws4_request[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "63ed1dbeae3d1146f51197796c4d3a76736601e30444c91aa306d186426a15ea</StringToSign><SignatureProvided>72cd72ef9948643604f7ccd460f29cdfa912f1fdde0faa913f84a4425dd43</SignatureProvided><StringToSignBytes>41 57 53 34 2d 48 4d 41 43 2d 53 48 41 32 35 36 0a 32 30 32 31 31 31 32 32 54 30 31 30 39 33 38 5a 0a 32 30 32 31 31 31 32 32 2f 61 70 2d 73 6f 75 74 68 65 61 73 74 2d 32 2f 73 33 2f 61 77 73 34 5f 72 65 71 75 65 73 74 0a 36 33 65 64 31 64 62 65 61 65 33 64 31 31 34 36 66 35 31 31 39 37 37 39 36 63 34 64 33 61 37 36 37 33 36 36 30 31 65 33 30 34 34 34 63 39 31 61 61 33 30 36 64 31 38 36 34 32 36 61 31 35 65 61</StringToSignBytes><CanonicalRequest>GET[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "/folder/filename.json[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "amz-sdk-invocation-id:b14a016a-a589-b314-536a-a19ad9e3a65c[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "amz-sdk-request:attempt=14;max=21[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "amz-sdk-retry:13/19092/372[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "content-type:application/octet-stream[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "host:temp-buckets.s3.ap-southeast-2.amazonaws.com[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "if-match:1f0aa30fff75c3b01269bf3a7e7ad241[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "range:[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "user-agent:Hadoop 3.3.1, aws-sdk-java/1.12.112 Linux/5.4.0-90-generic OpenJDK_64-Bit_Server_VM/25.292-b10 java/1.8.0_292 scala/2.13.5 vendor/Private_Build cfg/retry-mode/legacy[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "x-amz-content-sha256:UNSIGNED-PAYLOAD[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "x-amz-date:20211122T010938Z[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "amz-sdk-invocation-id;amz-sdk-request;amz-sdk-retry;content-type;host;if-match;range;user-agent;x-amz-content-sha256;x-amz-date[\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "UNSIGNED-PAYLOAD</CanonicalRequest><CanonicalRequestBytes>47 45 54 0a 2f 37 6e 65 77 73 2f 65 61 61 38 30 39 64 61 32 31 38 63 37 63 39 32 34 61 31 34 37 37 35 38 31 63 61 39 36 35 39 38 65 62 63 64 38 36 32 33 61 63 63 65 36 36 37 30 64 37 36 31 34 34 30 31 62 33 38 39 62 66 39 30 2e 6a 73 6f 6e 0a 0a 61 6d 7a 2d 73 64 6b 2d 69 6e 76 6f 63 61 74 69 6f 6e 2d 69 64 3a 62 31 34 61 30 31 36 61 2d 61 35 38 39 2d 62 33 31 34 2d 35 33 36 61 2d 61 31 39 61 64 39 65 33 61 36 35 63 0a 61 6d 7a 2d 73 64 6b 2d 72 65 71 75 65 73 74 3a 61 74 74 65 6d 70 74 3d 31 34 3b 6d 61 78 3d 32 31 0a 61 6d 7a 2d 73 64 6b 2d 72 65 74 72 79 3a 31 33 2f 31 39 30 39 32 2f 33 37 32 0a 63 6f 6e 74 65 6e 74 2d 74 79 70 65 3a 61 70 70 6c 69 63 61 74 69 6f 6e 2f 6f 63 74 65 74 2d 73 74 72 65 61 6d 0a 68 6f 73 74 3a 70 72 6f 64 2d 69 63 65 70 69 63 2d 72 61 77 32 70 61 63 73 2e 73 33 2e 61 70 2d 73 6f 75 74 68 65 61 73 74 2d 32 2e 61 6d 61 7a 6f 6e 61 77 73 2e 63 6f 6d 0a 69 66 2d 6d 61 74 63 68 3a 31 66 30 61 61 33 30 66 66 66 37 35 63 33 62 30 31 32 36 39 62 66 33 61 37 65 37 61 64 32 34 31 0a 72 61 6e 67 65 3a 0a 75 73 65 72 2d 61 67 65 6e 74 3a 48 61 64 6f 6f 70 20 33 2e 33 2e 31 2c 20 61 77 73 2d 73 64 6b 2d 6a 61 76 61 2f 31 2e 31 32 2e 31 31 32 20 4c 69 6e 75 78 2f 35 2e 34 2e 30 2d 39 30 2d 67 65 6e 65 72 69 63 20 4f 70 65 6e 4a 44 4b 5f 36 34 2d 42 69 74 5f 53 65 72 76 65 72 5f 56 4d 2f 32 35 2e 32 39 32 2d 62 31 30 20 6a 61 76 61 2f 31 2e 38 2e 30 5f 32 39 32 20 73 63 61 6c 61 2f 32 2e 31 33 2e 35 20 76 65 6e 64 6f 72 2f 50 72 69 76 61 74 65 5f 42 75 69 6c 64 20 63 66 67 2f 72 65 74 72 79 2d 6d 6f 64 65 2f 6c 65 67 61 63 79 0a 78 2d 61 6d 7a 2d 63 6f 6e 74 65 6e 74 2d 73 68 61 32 35 36 3a 55 4e 53 49 47 4e 45 44 2d 50 41 59 4c 4f 41 44 0a 78 2d 61 6d 7a 2d 64 61 74 65 3a 32 30 32 31 31 31 32 32 54 30 31 30 39 33 38 5a 0a 0a 61 6d 7a 2d 73 64 6b 2d 69 6e 76 6f 63 61 74 69 6f 6e 2d 69 64 3b 61 6d 7a 2d 73 64 6b 2d 72 65 71 75 65 73 74 3b 61 6d 7a 2d 73 64 6b 2d 72 65 74 72 79 3b 63 6f 6e 74 65 6e 74 2d 74 79 70 65 3b 68 6f 73 74 3b 69 66 2d 6d 61 74 63 68 3b 72 61 6e 67 65 3b 75 73 65 72 2d 61 67 65 6e 74 3b 78 2d 61 6d 7a 2d 63 6f 6e 74 65 6e 74 2d 73 68 61 32 35 36 3b 78 2d 61 6d 7a 2d 64 61 74 65 0a 55 4e 53 49 47 4e 45 44 2d 50 41 59 4c 4f 41 44</CanonicalRequestBytes><RequestId>03D8MX920R4B7Q3Y</RequestId><HostId>PIOL/VOfyuExRB2FLPovEO104N66SQfe+fx3nCL"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "lD5k51KsOe7m3un6LJUO+9UNCsWEMB/ydGeo=</HostId></Error>"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "[\r][\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "0[\r][\n]"
21/11/22 12:09:39 DEBUG wire: http-outgoing-8 << "[\r][\n]"
21/11/22 12:09:39 DEBUG PoolingHttpClientConnectionManager: Connection [id: 8][route: tls->http://proxy:3128->https://temp-buckets.s3.ap-southeast-2.amazonaws.com:443] can be kept alive for 60.0 seconds
21/11/22 12:09:39 DEBUG DefaultManagedHttpClientConnection: http-outgoing-8: set socket timeout to 0
21/11/22 12:09:39 DEBUG PoolingHttpClientConnectionManager: Connection released: [id: 8][route: tls->http://proxy:3128->https://temp-buckets.s3.ap-southeast-2.amazonaws.com:443][total available: 2; route allocated: 1 of 48; total allocated: 2 of 48]
21/11/22 12:09:39 DEBUG request: Received error response: com.amazonaws.services.s3.model.AmazonS3Exception: The request signature we calculated does not match the signature you provided. Check your key and signing method. (Service: Amazon S3; Status Code: 403; Error Code: SignatureDoesNotMatch; Request ID: 03D8MX920R4B7Q3Y; S3 Extended Request ID: PIOL/VOfyuExRB2FLPovEO104N66SQfe+fx3nCLlD5k51KsOe7m3un6LJUO+9UNCsWEMB/ydGeo=; Proxy: proxy), S3 Extended Request ID: PIOL/VOfyuExRB2FLPovEO104N66SQfe+fx3nCLlD5k51KsOe7m3un6LJUO+9UNCsWEMB/ydGeo=
21/11/22 12:09:39 DEBUG ClockSkewAdjuster: Reported server date (from 'Date' header): Mon, 22 Nov 2021 01:09:38 GMT

更新

@stevel cmets 之后,用 hadoop 文档确认属性并更新代码。仍然遇到同样的问题。

我们一直试图找出问题所在,可能是因为公司代理。解决此问题后将更新。

更改后的堆栈跟踪

Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The request signature we calculated does not match the signature you provided. Check your key and signing method. (Service: Amazon S3; Status Code: 403; Error Code: SignatureDoesNotMatch; Request ID: D26Z2QHKMBK110Q5; S3 Extended Request ID: 9MZ5uIB1P6HUEaBWrIqkxfUm8ftWPclkkQ8EFOffsWacj3Ki6U6koSmHt3d55n/ItS34bmUGU3I=; Proxy: jailbird.cp.pacs), S3 Extended Request ID: 9MZ5uIB1P6HUEaBWrIqkxfUm8ftWPclkkQ8EFOffsWacj3Ki6U6koSmHt3d55n/ItS34bmUGU3I=
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1828)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1412)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1374)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5227)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5173)
    at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1512)
    at org.apache.hadoop.fs.s3a.S3AInputStream.lambda$reopen$0(S3AInputStream.java:227)
    at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:115)
    ... 37 more

【问题讨论】:

hadoop s3a 文档中的“fs.s3a.signatureVersion”在哪里被覆盖——这只是您从其他 SO 帖子中获得的内容,您是否考虑过先查看 hadoop s3a 文档?作为 S3A 的主要作者之一,它与不阅读它或故障排除页面的人进行了一场失败的战斗,而只是复制损坏的 SO 代码然后寻求帮助。由于缺乏尽职调查而被否决。 谢谢@stevel。是的,你是对的。我已经删除了属性,只设置了 aws 凭据和代理。用文档确认它们。仍然会导致:com.amazonaws.services.s3.model.AmazonS3Exception:我们计算的请求签名与您提供的签名不匹配。检查您的密钥和签名方法。 【参考方案1】:

相同的代码直接在 AWS EMR 集群上运行。我相信是公司防火墙阻止了我的 AWS 调用。因此,不再是问题。

【讨论】:

以上是关于Apache Spark Hadoop S3A SignatureDoesNotMatch的主要内容,如果未能解决你的问题,请参考以下文章

如何在Spark提交中使用s3a和Apache spark 2.2(hadoop 2.8)?

Spark + s3 - 错误 - java.lang.ClassNotFoundException:找不到类 org.apache.hadoop.fs.s3a.S3AFileSystem

如何从 Apache Spark 访问 s3a:// 文件?

您可以在 Spark/Hadoop 中将 s3:// 翻译(或别名)为 s3a:// 吗?

Apache Spark错误使用hadoop将数据卸载到AWS S3

Apache Spark s3a 提交者 - 线程堆栈 - 内存不足问题