源码学习:yarn application 状态机
Posted PeersLee
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了源码学习:yarn application 状态机相关的知识,希望对你有一定的参考价值。
目录
Yarn Client Commands: yarn app|application
RMApp 是 ResourceManager 中用于维护一个 Application 生命周期的数据结构,由 RMAppImpl 实现,该类维护了一个 Application 状态机,记录了一个 Application 可能存在的各个状态 RMAppState 以及导致状态间转换的事件 RMAppEvent。
状态迁移图
RMAppState
public enum RMAppState
// 初始状态
NEW,
// RM 接受到 client 的 app submit 后
// 会创建一个 RMAppImpl 对象来维护 app 的状态
// 然后立即序列化 app 的基本信息用于故障恢复
// 默认 RMStateStore 是 FileSystemRMStateStore
// 由 yarn.resourcemanager.store.class 控制
//
// 1
// preState: RMAppState.NEW;
// eventType: RMAppEventType.START;
// txnHook: RMAppNewlySavingTransition
// 2
// preState: RMAppState.NEW_SAVING;
// eventType: RMAppEventType.NODE_UPDATE;
// txnHook: RMAppNodeUpdateTransition
NEW_SAVING,
// 经过合法性验证并且 app 基本信息已经序列化
// RM 会创建一个 RMAppAttemptImpl 进行一次运行尝试
//
// 1
// preState: RMAppState.NEW;
// eventType: RMAppEventType.RECOVER;
// txnHook: RMAppRecoveredTransition;
// 2
// preState: RMAppState.NEW_SAVING;
// eventType: RMAppEventType.APP_NEW_SAVED;
// txnHook: AddApplicationToSchedulerTransition;
SUBMITTED,
// 经过 ResourceScheduler 验证之后被提交到 SchedulerQueue 中
// e.g: CapacityScheduler
// yarn.scheduler.capacity.maximum-applications:
// Maximum number of applications that can be pending and running.
// 层级队列 hierarchical queues 的相关验证
// Submit to the queue
// Update the metrics
// Accepted application: a1 for user: u1 in queue: q1
//
// preState: RMAppState.SUBMITTED;
// eventType: RMAppEventType.APP_ACCEPTED;
// txnHook: StartAppAttemptTransition;
ACCEPTED,
// appMaster 已在某个 node 上运行
// RMAppAttemptImpl 已经处于 running 状态
RUNNING,
// RMAppEventType.ATTEMPT_FAILED 事件触发后
// 先判断失败次数是否超过 yarn.resourcemanager.am.max-attempts
// 若没超过则让状态机回到 ACCEPTED
// 若超过则进入 FINAL_SAVING 进行资源回收等善后操作
FINAL_SAVING,
// appMaster 通过 RPC 通知 RM app 运行结束将要退出
FINISHING,
// NM 通过心跳汇报 appMaster 所在的 container 运行结束
FINISHED,
// appMaster 运行失败
FAILED,
// 1
// preState: RMAppState.ACCEPTED;
// eventType: RMAppEventType.KILL;
// txnHook: KillAttemptTransition;
// 2
// preState: RMAppState.RUNNING;
// eventType: RMAppEventType.KILL;
// txnHook: KillAttemptTransition;
KILLING,
// RM 接受到 client 的 kill 命令时主动将 app 杀死
KILLED
RMAppEventType
public enum RMAppEventType
// Source: ClientRMService
START,
RECOVER,
KILL,
// Source: Scheduler and RMAppManager
APP_REJECTED,
// Source: Scheduler
APP_ACCEPTED,
// Source: RMAppAttempt
ATTEMPT_REGISTERED,
ATTEMPT_UNREGISTERED,
ATTEMPT_FINISHED, // Will send the final state
ATTEMPT_FAILED,
ATTEMPT_KILLED,
NODE_UPDATE,
ATTEMPT_LAUNCHED,
// Source: Container and ResourceTracker
APP_RUNNING_ON_NODE,
// Source: RMStateStore
APP_NEW_SAVED,
APP_UPDATE_SAVED,
APP_SAVE_FAILED,
Yarn Client Commands: yarn app|application
-appStates <States>
Works with -list to filter applications
based on input comma-separated list of
application states.
The valid application state can be one
of the following:
ALL, NEW, NEW_SAVING, SUBMITTED,
ACCEPTED, RUNNING, FINISHED,
FAILED, KILLED
e.g
yarn app -list -appStates 'RUNNING,FINISHED' | grep distcp | head | awk 'print $1'
application_1620823068070_0283
application_1620823068070_0282
application_1620823068070_0281
application_1620823068070_0280
application_1620823068070_0279
application_1620823068070_0278
application_1620823068070_0277
application_1620823068070_0276
application_1620823068070_0287
application_1620823068070_0286
以上是关于源码学习:yarn application 状态机的主要内容,如果未能解决你的问题,请参考以下文章