看看spark的源码

Posted yangjiming

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了看看spark的源码相关的知识,希望对你有一定的参考价值。

  1. IntelliJ IDEA安装plugins,加入scala插件和SBT插件

  2. 下载spark的源代码,下载地址 https://github.com/apache/spark,可以使用git下载或者下载zip包

  3. 导入项目,开启 IntelliJ 之后选择 Import Project,而后选择 Spark 源代码,并将其导入为 SBT 项目,在之后的过程中 SBT 过程将会自动进行,下载作为依赖的 jar 包。

  4. 源代码已经准备好了。开干吧

    private def schedule(): Unit = {
        if (state != RecoveryState.ALIVE) {
          return
        }
        // Drivers take strict precedence over executors
        val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))
        val numWorkersAlive = shuffledAliveWorkers.size
        var curPos = 0
        for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers
          // We assign workers to each waiting driver in a round-robin fashion. For each driver, we
          // start from the last worker that was assigned a driver, and continue onwards until we have
          // explored all alive workers.
          var launched = false
          var numWorkersVisited = 0
          while (numWorkersVisited < numWorkersAlive && !launched) {
            val worker = shuffledAliveWorkers(curPos)
            numWorkersVisited += 1
            if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
              launchDriver(worker, driver)
              waitingDrivers -= driver
              launched = true
            }
            curPos = (curPos + 1) % numWorkersAlive
          }
        }
        startExecutorsOnWorkers()
      }

     

 

以上是关于看看spark的源码的主要内容,如果未能解决你的问题,请参考以下文章

在这个 spark 代码片段中 ordering.by 是啥意思?

python+spark程序代码片段

可能是全网最详细的 Spark Sql Aggregate 源码剖析

spark源码阅读-脚本篇

spark关于join后有重复列的问题(org.apache.spark.sql.AnalysisException: Reference '*' is ambiguous)(代码片段

Spark学习之路 (十六)SparkCore的源码解读spark-submit提交脚本