Spark Launcher简记
Posted DataRain
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark Launcher简记相关的知识,希望对你有一定的参考价值。
一般来说提交spark程序,简单的用法就是使用spark-submit脚本,这种提交方式使用得比较多。但是通过脚本提交任务后如果需要对这个任务进行进一步操作的话,例如暂停任务、杀掉任务进程的话,就需要使用yarn application的相关命令进行,需要登录能访问到集群的机器上进行这种操作,有没有能通过web ui操作的方式呢?spark launcher提供了一些接口能够满足通过web ui的方式来提交、杀掉任务的需求。
灵魂画手,体谅体谅下
伪代码大概如下,其实非常简单,主要是自定义一个监听可以用于发送消息到指定位置,还有暴露了几个能够控制spark任务执行的接口。
SparkLauncher launcher = new SparkLauncher();
launcher.setAppName();
launcher.setAppResource();
launcher.setDeployMode();
launcher.setMainClass();
launcher.setSparkHome();
launcher.setMaster();
launcher.startApplication(new LogListener());
public class LogListener implements SparkAppHandle.Listener {
CountDownLatch countDownLatch;
public LogListener(CountDownLatch countDownLatch) {
this.countDownLatch = countDownLatch;
}
//这个方法是当任务的状态改变时,会产生该回调
public void stateChanged(SparkAppHandle handle) {
InfoLogger.info("app info: " + handle.getState().name());
if (handle.getState().isFinal()) {
countDownLatch.countDown();
}
}
//这个方法是当任务发生非状态改变的信息时,会产生该回调
public void infoChanged(SparkAppHandle handle) {
InfoLogger.info("app info: " + handle.getState().name());
}
}
输出的信息如下:
Spark launcher提供了以下的state ,我们获取到的状态信息无非就是以下的几种,看起来就觉得枯燥且乏味:
enum State {
/** The application has not reported back yet. */
UNKNOWN(false),
/** The application has connected to the handle. */
CONNECTED(false),
/** The application has been submitted to the cluster. */
SUBMITTED(false),
/** The application is running. */
RUNNING(false),
/** The application finished with a successful status. */
FINISHED(true),
/** The application finished with a failed status. */
FAILED(true),
/** The application was killed. */
KILLED(true),
/** The Spark Submit JVM exited with a unknown status. */
LOST(true);
private final boolean isFinal;
State(boolean isFinal) {
this.isFinal = isFinal;
}
/**
* Whether this state is a final state, meaning the application is not running anymore
* once it's reached.
*/
public boolean isFinal() {
return isFinal;
}
}
Spark launcher除了提供简单的监听方法以外,还提供了操作app的方法,主要包括停止任务、杀任务进程、与任务断开连接,所以我们能对spark任务做到的自主监控也只能到这个程度,更好的效果其实还是自带的web ui,DAG的每个方面都看得通通透透、一览无遗。
/**
* Asks the application to stop. This is best-effort, since the application may fail to receive
* or act on the command. Callers should watch for a state transition that indicates the
* application has really stopped.
*/
void stop();
/**
* Tries to kill the underlying application. Implies {@link #disconnect()}. This will not send
* a {@link #stop()} message to the application, so it's recommended that users first try to
* stop the application cleanly and only resort to this method if that fails.
*/
void kill();
/**
* Disconnects the handle from the application, without stopping it. After this method is called,
* the handle will not be able to communicate with the application anymore.
*/
void disconnect();
以上是关于Spark Launcher简记的主要内容,如果未能解决你的问题,请参考以下文章
Spark 安装 - 错误:无法找到或加载主类 org.apache.spark.launcher.Main
在这个 spark 代码片段中 ordering.by 是啥意思?