源码学习:yarn application 状态机

Posted PeersLee

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了源码学习:yarn application 状态机相关的知识,希望对你有一定的参考价值。

目录

状态迁移图

RMAppState

RMAppEventType

Yarn Client Commands: yarn app|application


RMApp 是 ResourceManager 中用于维护一个 Application 生命周期的数据结构,由 RMAppImpl 实现,该类维护了一个 Application 状态机,记录了一个 Application 可能存在的各个状态 RMAppState 以及导致状态间转换的事件 RMAppEvent。

状态迁移图

RMAppState

 public enum RMAppState 
  // 初始状态
  NEW,
  // RM 接受到 client 的 app submit 后
  // 会创建一个 RMAppImpl 对象来维护 app 的状态
  // 然后立即序列化 app 的基本信息用于故障恢复
  // 默认 RMStateStore 是 FileSystemRMStateStore
  // 由 yarn.resourcemanager.store.class 控制
  //
  // 1
  // preState: RMAppState.NEW;
  // eventType: RMAppEventType.START;
  // txnHook: RMAppNewlySavingTransition
  // 2
  // preState: RMAppState.NEW_SAVING;
  // eventType: RMAppEventType.NODE_UPDATE;
  // txnHook: RMAppNodeUpdateTransition
  NEW_SAVING,
  // 经过合法性验证并且 app 基本信息已经序列化
  // RM 会创建一个 RMAppAttemptImpl 进行一次运行尝试
  //
  // 1
  // preState: RMAppState.NEW;
  // eventType: RMAppEventType.RECOVER;
  // txnHook: RMAppRecoveredTransition;
  // 2
  // preState: RMAppState.NEW_SAVING;
  // eventType: RMAppEventType.APP_NEW_SAVED;
  // txnHook: AddApplicationToSchedulerTransition;
  SUBMITTED,
  // 经过 ResourceScheduler 验证之后被提交到 SchedulerQueue 中
  // e.g: CapacityScheduler
  // yarn.scheduler.capacity.maximum-applications:
  // Maximum number of applications that can be pending and running.
  // 层级队列 hierarchical queues 的相关验证
  // Submit to the queue
  // Update the metrics
  // Accepted application: a1 for user: u1 in queue: q1
  //
  // preState: RMAppState.SUBMITTED;
  // eventType: RMAppEventType.APP_ACCEPTED;
  // txnHook: StartAppAttemptTransition;
  ACCEPTED,
  // appMaster 已在某个 node 上运行
  // RMAppAttemptImpl 已经处于 running 状态
  RUNNING,
  // RMAppEventType.ATTEMPT_FAILED 事件触发后
  // 先判断失败次数是否超过 yarn.resourcemanager.am.max-attempts
  // 若没超过则让状态机回到 ACCEPTED
  // 若超过则进入 FINAL_SAVING 进行资源回收等善后操作
  FINAL_SAVING,
  // appMaster 通过 RPC 通知 RM app 运行结束将要退出
  FINISHING,
  // NM 通过心跳汇报 appMaster 所在的 container 运行结束
  FINISHED,
  // appMaster 运行失败
  FAILED,
  // 1
  // preState: RMAppState.ACCEPTED;
  // eventType: RMAppEventType.KILL;
  // txnHook: KillAttemptTransition;
  // 2
  // preState: RMAppState.RUNNING;
  // eventType: RMAppEventType.KILL;
  // txnHook: KillAttemptTransition;
  KILLING,
  // RM 接受到 client 的 kill 命令时主动将 app 杀死
  KILLED

 

RMAppEventType

public enum RMAppEventType 
  // Source: ClientRMService
  START,
  RECOVER,
  KILL,

  // Source: Scheduler and RMAppManager
  APP_REJECTED,

  // Source: Scheduler
  APP_ACCEPTED,

  // Source: RMAppAttempt
  ATTEMPT_REGISTERED,
  ATTEMPT_UNREGISTERED,
  ATTEMPT_FINISHED, // Will send the final state
  ATTEMPT_FAILED,
  ATTEMPT_KILLED,
  NODE_UPDATE,
  ATTEMPT_LAUNCHED,
  
  // Source: Container and ResourceTracker
  APP_RUNNING_ON_NODE,

  // Source: RMStateStore
  APP_NEW_SAVED,
  APP_UPDATE_SAVED,
  APP_SAVE_FAILED,

 

Yarn Client Commands: yarn app|application

-appStates <States>

Works with -list to filter applications
based on input comma-separated list of 
application states. 
The valid application state can be one
of the following:  
ALL, NEW, NEW_SAVING, SUBMITTED, 
ACCEPTED, RUNNING, FINISHED,
FAILED, KILLED

e.g

yarn app -list -appStates 'RUNNING,FINISHED' | grep distcp | head | awk 'print $1'
application_1620823068070_0283
application_1620823068070_0282
application_1620823068070_0281
application_1620823068070_0280
application_1620823068070_0279
application_1620823068070_0278
application_1620823068070_0277
application_1620823068070_0276
application_1620823068070_0287
application_1620823068070_0286

 

以上是关于源码学习:yarn application 状态机的主要内容,如果未能解决你的问题,请参考以下文章

源码学习:yarn application 状态机

YARN 查看/停止 application 状态 常用命令

Yarn状态机框架分析

YARN源码阅读基础模块1:服务(Service)

Yarn中的几种状态机

Yarn中的几种状态机