BlockManagerMaster对BlockManager的管理

Posted 2022-11-30 大冰的小屋

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了BlockManagerMaster对BlockManager的管理相关的知识，希望对你有一定的参考价值。

BlockManagerMaster是在SparkEnv中创建的，负责对Block的管理和协调，具体操作依赖于BlockManagerMasterEndpoint。Drive和Executor处理BlockManagerMaster的方式不同：

    val blockManagerMaster = new BlockManagerMaster(registerOrLookupEndpoint(
      BlockManagerMaster.DRIVER_ENDPOINT_NAME,
      new BlockManagerMasterEndpoint(rpcEnv, isLocal, conf, listenerBus)),
      conf, isDriver)

    // 如果当前应用程序是Driver，则创建BlockManagerMasterEndpoint，并且注册到RpcEnv中；
    // 如果当前应用程序是Executor，则从RpcEnv中找到BlockManagerMasterEndpoint的引用。
    def registerOrLookupEndpoint(
        name: String, endpointCreator: => RpcEndpoint):
      RpcEndpointRef = 
      if (isDriver) 
        logInfo("Registering " + name)
        rpcEnv.setupEndpoint(name, endpointCreator)
       else 
        RpcUtils.makeDriverRef(name, conf, rpcEnv)

Driver上的BlockManagerMaster对于存在与Executor上的BlockManager统一管理，比如Executor需要向Driver发送注册BlockManager、更新Executor上的Block的最新信息、询问所需要的Block目前所在的位置以及当Executor运行结束需要将此Executor移除等。而BlockManager只是负责管理所在Executor上的Block。
那么Driver是如何实现管理的呢？在Driver上的BlockManagerMaster会持有BlockManagerMasterEndpoint，所有的Executor会从RpcEnv中获取BlockManagerMasterEndpoint的引用。BlockManagerMasterEndpoint 本身是一个消息体，会负责通过远程消息通信的方式去管理所有节点的BlockManager。

1. BlockManagerMasterEndpoint

BlockManagerMasterEndpoint 只存在于Driver上。Executor上通过获取的它的引用，然后给它发消息实现和Driver交互。其构造方法如下：

/**
 * BlockManagerMasterEndpoint is an [[ThreadSafeRpcEndpoint]] on the master node to track statuses
 * of all slaves' block managers.
 */
private[spark]
class BlockManagerMasterEndpoint(
    override val rpcEnv: RpcEnv,
    val isLocal: Boolean,
    conf: SparkConf,
    listenerBus: LiveListenerBus)
  extends ThreadSafeRpcEndpoint with Logging

包含的内容：

  // 缓存所有的BlockManagerId及其BlockManagerInfo，而BlockManagerInfo存放的是它所在的Executor中所有Block的信息
  // Mapping from block manager id to the block manager's information.
  private val blockManagerInfo = new mutable.HashMap[BlockManagerId, BlockManagerInfo]

  // 缓存executorId与其拥有的BlockManagerId之间的映射关系
  // Mapping from executor ID to block manager ID.
  private val blockManagerIdByExecutor = new mutable.HashMap[String, BlockManagerId]

  // 缓存Block与BlockManagerId的映射关系
  // Mapping from block id to the set of block managers that have the block.
  private val blockLocations = new JHashMap[BlockId, mutable.HashSet[BlockManagerId]]

receiveAndReply 方法作为匹配BlockManagerMasterEndpoint接收到消息的偏函数：

  override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = 
    case RegisterBlockManager(blockManagerId, maxMemSize, slaveEndpoint) =>
      register(blockManagerId, maxMemSize, slaveEndpoint)
      context.reply(true)

    case _updateBlockInfo @ UpdateBlockInfo(
      blockManagerId, blockId, storageLevel, deserializedSize, size, externalBlockStoreSize) =>
      context.reply(updateBlockInfo(
        blockManagerId, blockId, storageLevel, deserializedSize, size, externalBlockStoreSize))
      listenerBus.post(SparkListenerBlockUpdated(BlockUpdatedInfo(_updateBlockInfo)))

    case GetLocations(blockId) =>
      context.reply(getLocations(blockId))

    case GetLocationsMultipleBlockIds(blockIds) =>
      context.reply(getLocationsMultipleBlockIds(blockIds))

    case GetPeers(blockManagerId) =>
      context.reply(getPeers(blockManagerId))

    case GetExecutorEndpointRef(executorId) =>
      context.reply(getExecutorEndpointRef(executorId))

    case GetMemoryStatus =>
      context.reply(memoryStatus)

    case GetStorageStatus =>
      context.reply(storageStatus)

    case GetBlockStatus(blockId, askSlaves) =>
      context.reply(blockStatus(blockId, askSlaves))

    case GetMatchingBlockIds(filter, askSlaves) =>
      context.reply(getMatchingBlockIds(filter, askSlaves))

    case RemoveRdd(rddId) =>
      context.reply(removeRdd(rddId))

    case RemoveShuffle(shuffleId) =>
      context.reply(removeShuffle(shuffleId))

    case RemoveBroadcast(broadcastId, removeFromDriver) =>
      context.reply(removeBroadcast(broadcastId, removeFromDriver))

    case RemoveBlock(blockId) =>
      removeBlockFromWorkers(blockId)
      context.reply(true)

    case RemoveExecutor(execId) =>
      removeExecutor(execId)
      context.reply(true)

    case StopBlockManagerMaster =>
      context.reply(true)
      stop()

    case BlockManagerHeartbeat(blockManagerId) =>
      context.reply(heartbeatReceived(blockManagerId))

    case HasCachedBlocks(executorId) =>
      blockManagerIdByExecutor.get(executorId) match 
        case Some(bm) =>
          if (blockManagerInfo.contains(bm)) 
            val bmInfo = blockManagerInfo(bm)
            context.reply(bmInfo.cachedBlocks.nonEmpty)
           else 
            context.reply(false)
          
        case None => context.reply(false)

2. askWithRetry方法

在Executor的BlockManagerMaster中，所有与Driver上的BlockManagerMaster的交互方法最终都调用了askWithRetry方法，

  /**
   * Send a message to the corresponding [[RpcEndpoint.receive]] and get its result within a
   * specified timeout, throw a SparkException if this fails even after the specified number of
   * retries. `timeout` will be used in every trial of calling `sendWithReply`. Because this method
   * retries, the message handling in the receiver side should be idempotent.
   *
   * Note: this is a blocking action which may cost a lot of time, so don't call it in a message
   * loop of [[RpcEndpoint]].
   *
   * @param message the message to send
   * @param timeout the timeout duration
   * @tparam T type of the reply message
   * @return the reply message from the corresponding [[RpcEndpoint]]
   */
  def askWithRetry[T: ClassTag](message: Any, timeout: RpcTimeout): T = 
    // TODO: Consider removing multiple attempts
    var attempts = 0
    var lastException: Exception = null
    while (attempts < maxRetries) 
      attempts += 1
      try 
        val future = ask[T](message, timeout)
        val result = timeout.awaitResult(future)
        if (result == null) 
          throw new SparkException("RpcEndpoint returned null")
        
        return result
       catch 
        case ie: InterruptedException => throw ie
        case e: Exception =>
          lastException = e
          logWarning(s"Error sending message [message = $message] in $attempts attempts", e)
      

      if (attempts < maxRetries) 
        Thread.sleep(retryWaitMs)
      
    

    throw new SparkException(
      s"Error sending message [message = $message]", lastException)

当通信失败时会进行一定次数的重试，可以使用spark.rpc.numRetries属性设置重试次数，默认是三次：

  /** Returns the configured number of times to retry connecting */
  def numRetries(conf: SparkConf): Int = 
    conf.getInt("spark.rpc.numRetries", 3)

retryWaitMs代表每次重试需要间隔的时间，默认是3秒：

  /** Returns the configured number of milliseconds to wait on each retry */
  def retryWaitMs(conf: SparkConf): Long = 
    conf.getTimeAsMs("spark.rpc.retry.wait", "3s")

请求超时的时间默认是120秒

  /** Returns the default Spark timeout to use for RPC ask operations. */
  private[spark] def askRpcTimeout(conf: SparkConf): RpcTimeout = 
    RpcTimeout(conf, Seq("spark.rpc.askTimeout", "spark.network.timeout"), "120s")

此外，tell方法作为askWithRetry的代理也经常被调用。

  /** Send a one-way message to the master endpoint, to which we expect it to reply with true. */
  private def tell(message: Any) 
    if (!driverEndpoint.askWithRetry[Boolean](message)) 
      throw new SparkException("BlockManagerMasterEndpoint returned false, expected true.")

3. 向BlockManagerMater注册BlockManagerId

Executor或者Driver自身的BlockManager在初始化的时候都需要向Driver的BlockManagerMaster注册BlockManager信息：

  /** Register the BlockManager's id with the driver. */
  def registerBlockManager(
      blockManagerId: BlockManagerId, maxMemSize: Long, slaveEndpoint: RpcEndpointRef): Unit = 
    logInfo("Trying to register BlockManager")
    tell(RegisterBlockManager(blockManagerId, maxMemSize, slaveEndpoint))
    logInfo("Registered BlockManager")

从上面的代码看到，消息内容包括BlockManagerId、最大内存、BlockManagerSlaveEndpoint。消息体带有BlockManagerSlaveEndpoint是为了方便接收BlockManagerMasterEndpoint回复的消息。这些消息被封装在了RegisterBlockManager，通过tell方法发送出去。RegisterBlockManager消息会被BlockManagerMasterEndpoint的receiveAndReply方法匹配并执行register方法注册BlockManager。注册完毕之后向BlockManagerSlaveEndpoint发送一个消息true。register方法：

  private def register(id: BlockManagerId, maxMemSize: Long, slaveEndpoint: RpcEndpointRef) 
    val time = System.currentTimeMillis()
    if (!blockManagerInfo.contains(id)) 
      blockManagerIdByExecutor.get(id.executorId) match 
        case Some(oldId) =>
          // A block manager of the same executor already exists, so remove it (assumed dead)
          logError("Got two different block manager registrations on same executor - "
              + s" will replace old one $oldId with new one $id")
          removeExecutor(id.executorId)
        case None =>
      
      logInfo("Registering block manager %s with %s RAM, %s".format(
        id.hostPort, Utils.bytesToString(maxMemSize), id))

      blockManagerIdByExecutor(id.executorId) = id

      blockManagerInfo(id) = new BlockManagerInfo(
        id, System.currentTimeMillis(), maxMemSize, slaveEndpoint)
    
    listenerBus.post(SparkListenerBlockManagerAdded(time, id, maxMemSize))

register方法确保blockManagerInfo持有消息中的blockManagerId及对应的信息，并且保证每个Executor最多只能有一个blockManagerId，旧的会被移除。最后向listenerBus中推送（post）一个SparkListenerBlockManagerAdded事件。

参考：深入理解Spark核心思想与源码分析

以上是关于BlockManagerMaster对BlockManager的管理的主要内容，如果未能解决你的问题，请参考以下文章