Apache Spark错误使用hadoop将数据卸载到AWS S3
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Apache Spark错误使用hadoop将数据卸载到AWS S3相关的知识,希望对你有一定的参考价值。
我正在使用Apache Spark
v2.3.1并尝试在处理之后将数据卸载到AWS S3
。像这样的东西:
data.write().parquet("s3a://" + bucketName + "/" + location);
配置似乎很好:
String region = System.getenv("AWS_REGION");
String accessKeyId = System.getenv("AWS_ACCESS_KEY_ID");
String secretAccessKey = System.getenv("AWS_SECRET_ACCESS_KEY");
spark.sparkContext().hadoopConfiguration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem");
spark.sparkContext().hadoopConfiguration().set("fs.s3a.awsRegion", region);
spark.sparkContext().hadoopConfiguration().set("fs.s3a.awsAccessKeyId", accessKeyId);
spark.sparkContext().hadoopConfiguration().set("fs.s3a.awsSecretAccessKey", secretAccessKey);
%HADOOP_HOME%
导致与Spark使用完全相同的版本(v2.6.5)并添加到Path中:
C:>hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
Hadoop jar and the required libraries
credential interact with credential providers
key manage keys via the KeyProvider
daemonlog get/set the log level for each daemon
or
CLASSNAME run the class named CLASSNAME
Maven
也是如此:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>2.6.5</version>
</dependency>
但我仍然在写入时收到以下错误。有什么想法吗?
Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method) ~[hadoop-common-2.6.5.jar:?]
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:557) ~[hadoop-common-2.6.5.jar:?]
at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977) ~[hadoop-common-2.6.5.jar:?]
是的,我错过了一步。把这个:https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.4/bin带到%HADOOP_HOME%in
。即使版本不匹配(v2.6.5 vs v2.6.4),这似乎仍然有效。
以上是关于Apache Spark错误使用hadoop将数据卸载到AWS S3的主要内容,如果未能解决你的问题,请参考以下文章
多节点 hadoop 集群中的 Apache Spark Sql 问题
Apache Spark Hadoop S3A SignatureDoesNotMatch
Spark + s3 - 错误 - java.lang.ClassNotFoundException:找不到类 org.apache.hadoop.fs.s3a.S3AFileSystem
return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask.
idea运行spark项目报错:org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0