将 Spark 和 Mysql 与 mysql-connector-java 一起使用
Posted
技术标签:
【中文标题】将 Spark 和 Mysql 与 mysql-connector-java 一起使用【英文标题】:Using Spark & Mysql with mysql-connector-java 【发布时间】:2018-03-26 10:43:15 【问题描述】:我用Spark (version 2.3.0)
覆盖设置了我的Hadoop Cluster (version 2.7.3)
。 Spark 使用YARN
来创建进程。
现在,我想从MariaDB Database (version 10)
获取数据以使用 Spark。
我下载了mysql-connector-java-5.1.46.tar.gz
以将 Spark 连接到我的数据库。
然后,我创建了一个这样的 python 文件:
#!/usr/bin/python
from pyspark import SparkContext
sc = SparkContext()
sqlContext = SQLContext(sc)
dataframe_mysql = sqlContext.read.format("jdbc").options(
url="jdbc:mysql://172.30.10.115:3306/DS/DS_Core",
driver = "com.mysql.jdbc.Driver",
dbtable = "MyTable",
user="spark",
password="**********").load()
我还在我的数据库中创建了一个具有所有权限的新用户 (spark)。
在我的终端中,我执行这个命令:
时间 ./spark/bin/spark-submit --jars /home/valentin/mysql-connector-java-5.1.46/mysql-connector-java-5.1.46-bin.jar --master yarn --deploy-mode cluster /home/valentin/SparkMysql.py
进程正在运行,这就是我得到的:
2018-03-26 12:37:03 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-03-26 12:37:04 INFO RMProxy:98 - Connecting to ResourceManager at master/172.30.10.100:8032
2018-03-26 12:37:05 INFO Client:54 - Requesting a new application from cluster with 2 NodeManagers
2018-03-26 12:37:05 INFO Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
2018-03-26 12:37:05 INFO Client:54 - Will allocate AM container, with 1408 MB memory including 384 MB overhead
2018-03-26 12:37:05 INFO Client:54 - Setting up container launch context for our AM
2018-03-26 12:37:05 INFO Client:54 - Setting up the launch environment for our AM container
2018-03-26 12:37:05 INFO Client:54 - Preparing resources for our AM container
2018-03-26 12:37:06 WARN Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2018-03-26 12:37:10 INFO Client:54 - Uploading resource file:/tmp/spark-9a7629aa-cce6-4ea9-a450-d7f4b3ac08eb/__spark_libs__4237668726850407973.zip -> hdfs://master:9000/user/valentin/.sparkStaging/application_1521799083882_0058/__spark_libs__4237$
2018-03-26 12:37:18 INFO Client:54 - Uploading resource file:/home/valentin/mysql-connector-java-5.1.46/mysql-connector-java-5.1.46-bin.jar -> hdfs://master:9000/user/valentin/.sparkStaging/application_1521799083882_0058/mysql-connector-java-5.1.$
2018-03-26 12:37:18 INFO Client:54 - Uploading resource file:/home/valentin/SparkMysql.py -> hdfs://master:9000/user/valentin/.sparkStaging/application_1521799083882_0058/SparkMysql.py
2018-03-26 12:37:18 INFO Client:54 - Uploading resource file:/home/valentin/spark/python/lib/pyspark.zip -> hdfs://master:9000/user/valentin/.sparkStaging/application_1521799083882_0058/pyspark.zip
2018-03-26 12:37:18 INFO Client:54 - Uploading resource file:/home/valentin/spark/python/lib/py4j-0.10.6-src.zip -> hdfs://master:9000/user/valentin/.sparkStaging/application_1521799083882_0058/py4j-0.10.6-src.zip
2018-03-26 12:37:18 INFO Client:54 - Uploading resource file:/tmp/spark-9a7629aa-cce6-4ea9-a450-d7f4b3ac08eb/__spark_conf__6263851109037900321.zip -> hdfs://master:9000/user/valentin/.sparkStaging/application_1521799083882_0058/__spark_conf__.zip
2018-03-26 12:37:19 INFO SecurityManager:54 - Changing view acls to: valentin
2018-03-26 12:37:19 INFO SecurityManager:54 - Changing modify acls to: valentin
2018-03-26 12:37:19 INFO SecurityManager:54 - Changing view acls groups to:
2018-03-26 12:37:19 INFO SecurityManager:54 - Changing modify acls groups to:
2018-03-26 12:37:19 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(valentin); groups with view permissions: Set(); users with modify permissions: Set(valentin); groups with$
2018-03-26 12:37:19 INFO Client:54 - Submitting application application_1521799083882_0058 to ResourceManager
2018-03-26 12:37:19 INFO YarnClientImpl:273 - Submitted application application_1521799083882_0058
2018-03-26 12:37:20 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:20 INFO Client:54 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1522060639124
final status: UNDEFINED
tracking URL: http://master:8088/proxy/application_1521799083882_0058/
user: valentin
2018-03-26 12:37:21 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:22 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:23 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:24 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:25 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:26 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:27 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:28 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:29 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:30 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:31 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:32 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:33 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:34 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:35 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:36 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:37 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:38 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:39 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:40 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:41 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:42 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:43 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:44 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:45 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:46 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:47 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:48 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:38:04 INFO Client:54 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.30.10.102
ApplicationMaster RPC port: 0
queue: default
start time: 1522060639124
final status: UNDEFINED
tracking URL: http://master:8088/proxy/application_1521799083882_0058/
user: valentin
2018-03-26 12:38:05 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:06 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:07 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:08 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:09 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:10 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:11 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:12 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:13 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:14 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:15 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:16 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:17 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:18 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:19 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:20 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:21 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:22 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:23 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:24 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:25 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:26 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:38:26 INFO Client:54 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1522060639124
final status: UNDEFINED
tracking URL: http://master:8088/proxy/application_1521799083882_0058/
user: valentin
2018-03-26 12:38:27 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:38:28 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:38:29 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:38:30 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:38:31 INFO Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:38:32 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:32 INFO Client:54 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.30.10.101
ApplicationMaster RPC port: 0
queue: default
start time: 1522060639124
final status: UNDEFINED
tracking URL: http://master:8088/proxy/application_1521799083882_0058/
user: valentin
2018-03-26 12:38:33 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:34 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:35 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:36 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:37 INFO Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:38 INFO Client:54 - Application report for application_1521799083882_0058 (state: FINISHED)
2018-03-26 12:38:38 INFO Client:54 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.30.10.101
ApplicationMaster RPC port: 0
queue: default
start time: 1522060639124
final status: FAILED
tracking URL: http://master:8088/proxy/application_1521799083882_0058/
user: valentin
2018-03-26 12:38:38 INFO ShutdownHookManager:54 - Shutdown hook called
2018-03-26 12:38:38 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-9a7629aa-cce6-4ea9-a450-d7f4b3ac08eb
2018-03-26 12:38:38 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-b4f8af34-b6cc-4ddf-afc8-d0707cc34925
YARN 日志不显示任何 WARN 或 ERROR。 它是好的连接器吗?
你有什么想法吗?
【问题讨论】:
【参考方案1】:感谢 applicationId 日志,我找到了解决方案:
我的 datanode2 未被授权访问我的 MariaDB 数据库。只有我的 namenode 和我的 datanode1 可以访问 MariaDB。
我找到了日志:
http://slave2:8042/node/containerlogs/container_1521799083882_0059_02_000001/valentin/
我有stdout logs
【讨论】:
以上是关于将 Spark 和 Mysql 与 mysql-connector-java 一起使用的主要内容,如果未能解决你的问题,请参考以下文章
Jdbc Driver驱动和ServerTimeZone时区的的问题