Spark Launcher简记

Posted DataRain

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark Launcher简记相关的知识,希望对你有一定的参考价值。

一般来说提交spark程序,简单的用法就是使用spark-submit脚本,这种提交方式使用得比较多。但是通过脚本提交任务后如果需要对这个任务进行进一步操作的话,例如暂停任务、杀掉任务进程的话,就需要使用yarn application的相关命令进行,需要登录能访问到集群的机器上进行这种操作,有没有能通过web ui操作的方式呢?spark launcher提供了一些接口能够满足通过web ui的方式来提交、杀掉任务的需求。

    

灵魂画手,体谅体谅下


伪代码大概如下,其实非常简单,主要是自定义一个监听可以用于发送消息到指定位置,还有暴露了几个能够控制spark任务执行的接口。

SparkLauncher launcher = new SparkLauncher();launcher.setAppName();launcher.setAppResource();launcher.setDeployMode();launcher.setMainClass();launcher.setSparkHome();launcher.setMaster();launcher.startApplication(new LogListener());
public class LogListener implements SparkAppHandle.Listener { CountDownLatch countDownLatch; public LogListener(CountDownLatch countDownLatch) { this.countDownLatch = countDownLatch;}//这个方法是当任务的状态改变时,会产生该回调 @Override public void stateChanged(SparkAppHandle handle) { InfoLogger.info("app info: " + handle.getState().name()); if (handle.getState().isFinal()) { countDownLatch.countDown(); }}//这个方法是当任务发生非状态改变的信息时,会产生该回调 @Override public void infoChanged(SparkAppHandle handle) { InfoLogger.info("app info: " + handle.getState().name()); }}


输出的信息如下:


Spark launcher提供了以下的state ,我们获取到的状态信息无非就是以下的几种,看起来就觉得枯燥且乏味:

 enum State { /** The application has not reported back yet. */ UNKNOWN(false), /** The application has connected to the handle. */ CONNECTED(false), /** The application has been submitted to the cluster. */ SUBMITTED(false), /** The application is running. */ RUNNING(false), /** The application finished with a successful status. */ FINISHED(true), /** The application finished with a failed status. */ FAILED(true), /** The application was killed. */ KILLED(true), /** The Spark Submit JVM exited with a unknown status. */ LOST(true);
private final boolean isFinal; State(boolean isFinal) { this.isFinal = isFinal; }
/** * Whether this state is a final state, meaning the application is not running anymore * once it's reached. */ public boolean isFinal() { return isFinal; } }


Spark launcher除了提供简单的监听方法以外,还提供了操作app的方法,主要包括停止任务、杀任务进程、与任务断开连接,所以我们能对spark任务做到的自主监控也只能到这个程度,更好的效果其实还是自带的web ui,DAG的每个方面都看得通通透透、一览无遗。

 /** * Asks the application to stop. This is best-effort, since the application may fail to receive * or act on the command. Callers should watch for a state transition that indicates the * application has really stopped. */ void stop();

/** * Tries to kill the underlying application. Implies {@link #disconnect()}. This will not send * a {@link #stop()} message to the application, so it's recommended that users first try to * stop the application cleanly and only resort to this method if that fails. */ void kill();

/** * Disconnects the handle from the application, without stopping it. After this method is called, * the handle will not be able to communicate with the application anymore. */ void disconnect();



以上是关于Spark Launcher简记的主要内容,如果未能解决你的问题,请参考以下文章

Spark监听简记

Spark 安装 - 错误:无法找到或加载主类 org.apache.spark.launcher.Main

Spark 学习总结

在这个 spark 代码片段中 ordering.by 是啥意思?

由于令牌在 24 小时后无法在缓存中找到,Spark Launcher 作业未启动

python+spark程序代码片段