spark.yarn.jar和spark.yarn.archive的使用
Posted 大葱拌豆腐
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了spark.yarn.jar和spark.yarn.archive的使用相关的知识,希望对你有一定的参考价值。
启动Spark任务时,在没有配置spark.yarn.archive
或者spark.yarn.jars
时, 会看到不停地上传jar非常耗时;使用spark.yarn.archive
可以大大地减少任务的启动时间,整个处理过程如下
1.在本地创建zip文件
[email protected]:~/env/spark$ cd jars/ [email protected]:~/env/spark/jars$ zip spark2.1.1-hadoop2.7.3.zip ./*
2.上传至HDFS并更改权限
[email protected]:~/env/spark$ hdfs dfs -mkdir /tmp/spark-archive [email protected]:~/env/spark$ hdfs dfs -put ./spark2.1.1-hadoop2.7.3.zip /tmp/spark-archive [email protected]:~/env/spark$ hdfs dfs -chmod 775 /tmp/spark-archive/spark2.1.1-hadoop2.7.3.zip
3.配置spark-defaut.conf
hdfs:///tmp/spark-archive/spark2.1.1-hadoop2.7.3.zip
可以参考日志如下:
17/08/10 14:59:27 INFO Client: To enable the AM to login from keytab, credentials are being copied over to the AM via the YARN Secure Distributed Cache. 17/08/10 14:59:27 INFO Client: Uploading resource file:/etc/security/keytabs/hive.service.keytab -> hdfs://hz-test-01/user/hive/.sparkStaging/application_1500533600435_2825/hive.service.keytab 17/08/10 14:59:27 INFO Client: Source and destination file systems are the same. Not copying hdfs:/tmp/spark-archive/spark2.1.1-hadoop2.7.3.zip 17/08/10 14:59:27 INFO Client: Uploading resource file:/home/hzlishuming/env/spark-2.1.1/local/spark-6606333c-1e5b-462c-ad39-aaf75251c246/__spark_conf__2962372142699552959.zip -> hdfs://hz-test-01/user/hive/.sparkStaging/application_1500533600435_2825/__spark_conf__.zip
以上是关于spark.yarn.jar和spark.yarn.archive的使用的主要内容,如果未能解决你的问题,请参考以下文章
Spark Yarn-cluster 与 Yarn-client
Spark Yarn-cluster与Yarn-client
Spark运行内存溢出--->spark.yarn.executor.memoryOverhead