将 Spark 和 Mysql 与 mysql-connector-java 一起使用

Posted

技术标签:

【中文标题】将 Spark 和 Mysql 与 mysql-connector-java 一起使用【英文标题】:Using Spark & Mysql with mysql-connector-java 【发布时间】:2018-03-26 10:43:15 【问题描述】:

我用Spark (version 2.3.0) 覆盖设置了我的Hadoop Cluster (version 2.7.3)。 Spark 使用YARN 来创建进程。

现在,我想从MariaDB Database (version 10) 获取数据以使用 Spark。

我下载了mysql-connector-java-5.1.46.tar.gz 以将 Spark 连接到我的数据库。

然后,我创建了一个这样的 python 文件:

#!/usr/bin/python

from pyspark import SparkContext

sc = SparkContext()
sqlContext = SQLContext(sc)

dataframe_mysql = sqlContext.read.format("jdbc").options(
    url="jdbc:mysql://172.30.10.115:3306/DS/DS_Core",
    driver = "com.mysql.jdbc.Driver",
    dbtable = "MyTable",
    user="spark",
    password="**********").load()

我还在我的数据库中创建了一个具有所有权限的新用户 (spark)。

在我的终端中,我执行这个命令:

时间 ./spark/bin/spark-submit --jars /home/valentin/mysql-connector-java-5.1.46/mysql-connector-java-5.1.46-bin.jar --master yarn --deploy-mode cluster /home/valentin/SparkMysql.py

进程正在运行,这就是我得到的:

2018-03-26 12:37:03 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-03-26 12:37:04 INFO  RMProxy:98 - Connecting to ResourceManager at master/172.30.10.100:8032
2018-03-26 12:37:05 INFO  Client:54 - Requesting a new application from cluster with 2 NodeManagers
2018-03-26 12:37:05 INFO  Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
2018-03-26 12:37:05 INFO  Client:54 - Will allocate AM container, with 1408 MB memory including 384 MB overhead
2018-03-26 12:37:05 INFO  Client:54 - Setting up container launch context for our AM
2018-03-26 12:37:05 INFO  Client:54 - Setting up the launch environment for our AM container
2018-03-26 12:37:05 INFO  Client:54 - Preparing resources for our AM container
2018-03-26 12:37:06 WARN  Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2018-03-26 12:37:10 INFO  Client:54 - Uploading resource file:/tmp/spark-9a7629aa-cce6-4ea9-a450-d7f4b3ac08eb/__spark_libs__4237668726850407973.zip -> hdfs://master:9000/user/valentin/.sparkStaging/application_1521799083882_0058/__spark_libs__4237$
2018-03-26 12:37:18 INFO  Client:54 - Uploading resource file:/home/valentin/mysql-connector-java-5.1.46/mysql-connector-java-5.1.46-bin.jar -> hdfs://master:9000/user/valentin/.sparkStaging/application_1521799083882_0058/mysql-connector-java-5.1.$
2018-03-26 12:37:18 INFO  Client:54 - Uploading resource file:/home/valentin/SparkMysql.py -> hdfs://master:9000/user/valentin/.sparkStaging/application_1521799083882_0058/SparkMysql.py
2018-03-26 12:37:18 INFO  Client:54 - Uploading resource file:/home/valentin/spark/python/lib/pyspark.zip -> hdfs://master:9000/user/valentin/.sparkStaging/application_1521799083882_0058/pyspark.zip
2018-03-26 12:37:18 INFO  Client:54 - Uploading resource file:/home/valentin/spark/python/lib/py4j-0.10.6-src.zip -> hdfs://master:9000/user/valentin/.sparkStaging/application_1521799083882_0058/py4j-0.10.6-src.zip
2018-03-26 12:37:18 INFO  Client:54 - Uploading resource file:/tmp/spark-9a7629aa-cce6-4ea9-a450-d7f4b3ac08eb/__spark_conf__6263851109037900321.zip -> hdfs://master:9000/user/valentin/.sparkStaging/application_1521799083882_0058/__spark_conf__.zip
2018-03-26 12:37:19 INFO  SecurityManager:54 - Changing view acls to: valentin
2018-03-26 12:37:19 INFO  SecurityManager:54 - Changing modify acls to: valentin
2018-03-26 12:37:19 INFO  SecurityManager:54 - Changing view acls groups to:
2018-03-26 12:37:19 INFO  SecurityManager:54 - Changing modify acls groups to:
2018-03-26 12:37:19 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(valentin); groups with view permissions: Set(); users  with modify permissions: Set(valentin); groups with$
2018-03-26 12:37:19 INFO  Client:54 - Submitting application application_1521799083882_0058 to ResourceManager
2018-03-26 12:37:19 INFO  YarnClientImpl:273 - Submitted application application_1521799083882_0058
2018-03-26 12:37:20 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:20 INFO  Client:54 -
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1522060639124
         final status: UNDEFINED
         tracking URL: http://master:8088/proxy/application_1521799083882_0058/
         user: valentin
2018-03-26 12:37:21 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:22 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:23 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:24 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:25 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:26 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:27 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:28 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:29 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:30 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:31 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:32 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:33 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:34 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:35 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:36 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:37 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:38 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:39 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:40 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:41 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:42 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:43 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:44 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:45 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:46 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:47 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:37:48 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
2018-03-26 12:38:04 INFO  Client:54 -
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 172.30.10.102
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1522060639124
         final status: UNDEFINED
         tracking URL: http://master:8088/proxy/application_1521799083882_0058/
         user: valentin
2018-03-26 12:38:05 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:06 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:07 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:08 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:09 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:10 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:11 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:12 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:13 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:14 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:15 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:16 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:17 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:18 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:19 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:20 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:21 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:22 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:23 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:24 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:25 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:26 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)

    2018-03-26 12:38:26 INFO  Client:54 -
             client token: N/A
             diagnostics: N/A
             ApplicationMaster host: N/A
             ApplicationMaster RPC port: -1
             queue: default
             start time: 1522060639124
             final status: UNDEFINED
             tracking URL: http://master:8088/proxy/application_1521799083882_0058/
             user: valentin
    2018-03-26 12:38:27 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
    2018-03-26 12:38:28 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
    2018-03-26 12:38:29 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
    2018-03-26 12:38:30 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
    2018-03-26 12:38:31 INFO  Client:54 - Application report for application_1521799083882_0058 (state: ACCEPTED)
    2018-03-26 12:38:32 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
    2018-03-26 12:38:32 INFO  Client:54 -
             client token: N/A
             diagnostics: N/A
             ApplicationMaster host: 172.30.10.101
             ApplicationMaster RPC port: 0
             queue: default
             start time: 1522060639124
             final status: UNDEFINED
             tracking URL: http://master:8088/proxy/application_1521799083882_0058/
             user: valentin
2018-03-26 12:38:33 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:34 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:35 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:36 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:37 INFO  Client:54 - Application report for application_1521799083882_0058 (state: RUNNING)
2018-03-26 12:38:38 INFO  Client:54 - Application report for application_1521799083882_0058 (state: FINISHED)
2018-03-26 12:38:38 INFO  Client:54 -
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 172.30.10.101
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1522060639124
         final status: FAILED
         tracking URL: http://master:8088/proxy/application_1521799083882_0058/
         user: valentin
2018-03-26 12:38:38 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-03-26 12:38:38 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-9a7629aa-cce6-4ea9-a450-d7f4b3ac08eb
2018-03-26 12:38:38 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-b4f8af34-b6cc-4ddf-afc8-d0707cc34925

YARN 日志不显示任何 WARN 或 ERROR它是好的连接器吗?

你有什么想法吗?

【问题讨论】:

【参考方案1】:

感谢 applicationId 日志,我找到了解决方案:

我的 datanode2 未被授权访问我的 MariaDB 数据库。只有我的 namenode 和我的 datanode1 可以访问 MariaDB。

我找到了日志:

http://slave2:8042/node/containerlogs/container_1521799083882_0059_02_000001/valentin/

我有stdout logs

【讨论】:

以上是关于将 Spark 和 Mysql 与 mysql-connector-java 一起使用的主要内容,如果未能解决你的问题,请参考以下文章

怎么用spark 将mysql数据导入 hive

Jdbc Driver驱动和ServerTimeZone时区的的问题

Spark与mysql整合

使用codesmith无法连接mysql问题

是否可以使用 spark 的 jdbc 驱动程序将 apache spark 与 jasper 集成?

如何将一行与 spark 数据集中的所有其他行进行比较?