Spark配置

Posted Laurence 

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark配置相关的知识,希望对你有一定的参考价值。

本文记录四种情形下AWS EMR集群Spark的配置,以便备差。四种情形分别为:

① 使用Hive Metastore,选装Spark
② 使用Glue Data Catalog,选装Spark
③ 使用Hive Metastore,不选装Spark
④ 使用Glue Data Catalog,不选装Spark

需要解释的是:EMR在不勾选Spark组件时,依然会在集群上提供Spark的程序包,相关命令行工具也会添加到PATH路径中,可直接调用,但是相关配置文件为空(即没有做任何配置,均为默认值)。备注:EMR版本:6.9.0

① 使用Hive Metastore,选装Spark

spark.eventLog.enabled = true
spark.repl.class.uri = spark://ip-10-0-7-169.cn-north-1.compute.internal:45889/classes
spark.driver.extraJavaOptions = -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -XX:OnOutOfMemoryError='kill -9 %p'
spark.sql.parquet.output.committer.class = com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter
spark.blacklist.decommissioning.timeout = 1h
spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS = $(hostname -f)
spark.sql.emr.internal.extensions = com.amazonaws.emr.spark.EmrSparkSessionExtensions
spark.eventLog.dir = hdfs:///var/log/spark/apps
spark.history.fs.logDirectory = hdfs:///var/log/spark/apps
spark.ui.filters = org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
spark.executor.memory = 4743M
spark.executor.extraLibraryPath = /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
spark.home = /usr/lib/spark
spark.emr.default.executor.memory = 4743M
spark.app.startTime = 1676624138446
spark.hadoop.yarn.timeline-service.enabled = false
spark.emr.default.executor.cores = 2
spark.executor.id = driver
spark.driver.extraClassPath = /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar
spark.driver.memory = 2048M
spark.driver.port = 45889
spark.hadoop.mapreduce.output.fs.optimized.committer.enabled = true
spark.decommissioning.timeout.threshold = 20
spark.sql.catalogImplementation = hive
spark.stage.attempt.ignoreOnDecommissionFetchFailure = true
spark.jars = 
spark.repl.class.outputDir = /mnt/tmp/spark-1bf30b6e-00fe-491c-a40c-60ddda563d87/repl-23ef15f1-3024-4c20-81e4-fd8bd99153b5
spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds = 2000
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem = 2
spark.app.submitTime = 1676624132376
spark.yarn.dist.files = file:/etc/spark/conf.dist/hive-site.xml,file:/etc/hudi/conf.dist/hudi-defaults.conf
spark.driver.host = ip-10-0-7-169.cn-north-1.compute.internal
spark.app.id = application_1676536871900_0007
spark.driver.appUIAddress = http://ip-10-0-7-169.cn-north-1.compute.internal:4040
spark.app.name = Spark shell
spark.sql.hive.metastore.sharedPrefixes = com.amazonaws.services.dynamodbv2
spark.submit.deployMode = client
spark.sql.parquet.fs.optimized.committer.optimization-enabled = true
spark.driver.extraLibraryPath = /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem = true
spark.executor.extraClassPath = /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar
spark.sql.warehouse.dir = hdfs://ip-10-0-7-169.cn-north-1.compute.internal:8020/user/spark/warehouse
spark.history.ui.port = 18080
spark.shuffle.service.enabled = true
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS = ip-10-0-7-169.cn-north-1.compute.internal
spark.driver.defaultJavaOptions = -XX:OnOutOfMemoryError='kill -9 %p'
spark.resourceManager.cleanupExpiredHost = true
spark.executor.defaultJavaOptions = -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'
spark.executor.cores = 2
spark.files.fetchFailure.unRegisterOutputOnHost = true
spark.master = yarn
spark.executor.extraJavaOptions = -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'
spark.submit.pyFiles = 
spark.dynamicAllocation.enabled = true
spark.yarn.historyServer.address = ip-10-0-7-169.cn-north-1.compute.internal:18080
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES = http://ip-10-0-7-169.cn-north-1.compute.internal:20888/proxy/application_1676536871900_0007
spark.ui.showConsoleProgress = true
spark.blacklist.decommissioning.enabled = true

② 使用Glue Data Catalog,选装Spark

spark.repl.class.uri = spark://ip-10-0-4-181.cn-north-1.compute.internal:46015/classes
spark.eventLog.enabled = true
spark.app.startTime = 1676684277306
spark.driver.extraJavaOptions = -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -XX:OnOutOfMemoryError='kill -9 %p'
spark.sql.parquet.output.committer.class = com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter
spark.blacklist.decommissioning.timeout = 1h
spark.driver.port = 46015
spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS = $(hostname -f)
spark.sql.emr.internal.extensions = com.amazonaws.emr.spark.EmrSparkSessionExtensions
spark.app.id = application_1676683148518_0001
spark.eventLog.dir = hdfs:///var/log/spark/apps
spark.sql.warehouse.dir = hdfs:///user/spark/warehouse
spark.history.fs.logDirectory = hdfs:///var/log/spark/apps
spark.ui.filters = org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
spark.app.submitTime = 1676684271620
spark.executor.memory = 4743M
spark.executor.extraLibraryPath = /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
spark.home = /usr/lib/spark
spark.emr.default.executor.memory = 4743M
spark.hadoop.yarn.timeline-service.enabled = false
spark.emr.default.executor.cores = 2
spark.executor.id = driver
spark.yarn.historyServer.address = ip-10-0-4-181.cn-north-1.compute.internal:18080
spark.driver.extraClassPath = /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar
spark.driver.memory = 2048M
spark.hadoop.mapreduce.output.fs.optimized.committer.enabled = true
spark.decommissioning.timeout.threshold = 20
spark.sql.catalogImplementation = hive
spark.stage.attempt.ignoreOnDecommissionFetchFailure = true
spark.jars = 
spark.repl.class.outputDir = /mnt/tmp/spark-fb709940-2538-429d-bf04-52eb367ea298/repl-58df11c4-608a-43c5-a386-f2651e5b58b3
spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds = 2000
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem = 2
spark.yarn.dist.files = file:/etc/spark/conf.dist/hive-site.xml,file:/etc/hudi/conf.dist/hudi-defaults.conf
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES = http://ip-10-0-4-181.cn-north-1.compute.internal:20888/proxy/application_1676683148518_0001
spark.app.name = Spark shell
spark.sql.hive.metastore.sharedPrefixes = com.amazonaws.services.dynamodbv2
spark.submit.deployMode = client
spark.driver.appUIAddress = http://ip-10-0-4-181.cn-north-1.compute.internal:4040
spark.sql.parquet.fs.optimized.committer.optimization-enabled = true
spark.driver.extraLibraryPath = /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem = true
spark.executor.extraClassPath = /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar
spark.history.ui.port = 18080
spark.shuffle.service.enabled = true
spark.driver.defaultJavaOptions = -XX:OnOutOfMemoryError='kill -9 %p'
spark.resourceManager.cleanupExpiredHost = true
spark.executor.defaultJavaOptions = -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'
spark.executor.cores = 2
spark.files.fetchFailure.unRegisterOutputOnHost = true
spark.driver.host = ip-10-0-4-181.cn-north-1.compute.internal
spark.master = yarn
spark.executor.extraJavaOptions = -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS = ip-10-0-4-181.cn-north-1.compute.internal
spark.submit.pyFiles = 
spark.dynamicAllocation.enabled = true
spark.ui.showConsoleProgress = true
spark.blacklist.decommissioning.enabled = true

③ 使用Hive Metastore,不选装Spark

spark.driver.extraJavaOptions = -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED
spark.repl.class.outputDir = /mnt/tmp/spark-0bf48f9c-dd91-4aba-ba25-ccd069e8d39a/repl-3a101885-06d2-4bb6-aa71-fccdbffc429e
spark.home = /usr/lib/spark
spark.executor.id = driver
spark.driver.port = 40387
spark.app.startTime = 1676624118151
spark.driver.host = ip-10-0-15-96.cn-north-1.compute.internal
spark.app.name = Spark shell
spark.sql.catalogImplementation = hive
spark.app.id = local-1676624119262
spark.sql.warehouse.dir = file:/etc/spark/conf.dist/spark-warehouse
spark.executor.extraJavaOptions = -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED
spark.app.submitTime = 1676624112493
spark.jars = 
spark.master = local[*]
spark.submit.pyFiles = 
spark.submit.deployMode = client
spark.repl.class.uri = spark://ip-10-0-15-96.cn-north-1.compute.internal:40387/classes
spark.ui.showConsoleProgress = true

④ 使用Glue Data Catalog,不选装Spark

spark.home = /usr/lib/spark
spark.repl.class.uri = spark://ip-10-0-0-110.cn-north-1.compute.internal:38303/classes
spark.app.id = local-1676684296310
spark.repl.class.outputDir = /mnt/tmp/spark-a8d442c5-eaea-4fd0-af64-472c539d798c/repl-cf8c68d7-ebd7-4325-8072-c1431f388bd7
spark.executor.id = driver
spark.app.name = Spark shell
spark.sql.catalogImplementation = hive
spark.app.submitTime = 1676684289522
spark.driver.host = ip-10-0-0-110.cn-north-1.compute.internal
spark.executor.extraJavaOptions = -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED
spark.jars = 
spark.master = local[*]
spark.submit.pyFiles = 
spark.submit.deployMode = client
spark.driver.port = 38303
spark.app.startTime = 1676684295257
spark.ui.showConsoleProgress = true

以上是关于Spark配置的主要内容,如果未能解决你的问题,请参考以下文章

AWS Glue CDK - 创建作业类型 Spark (Glue 2.0)

AWS Glue 错误 |无法使用 spark 从开发人员端点读取 Glue 表

如何在 AWS Glue 中使用 Spark 包?

AWS Glue - Spark 作业 - 如何增加内存限制或更有效地运行?

从 Glue 目录和 Glue Py Spark 脚本中的动态路径同步 CSV 文件

使用 Glue 连接和 spark scala 覆盖 Mysql 表