看看spark的源码
Posted yangjiming
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了看看spark的源码相关的知识,希望对你有一定的参考价值。
-
IntelliJ IDEA安装plugins,加入scala插件和SBT插件
-
下载spark的源代码,下载地址 https://github.com/apache/spark,可以使用git下载或者下载zip包
-
导入项目,开启 IntelliJ 之后选择 Import Project,而后选择 Spark 源代码,并将其导入为 SBT 项目,在之后的过程中 SBT 过程将会自动进行,下载作为依赖的 jar 包。
-
源代码已经准备好了。开干吧
private def schedule(): Unit = { if (state != RecoveryState.ALIVE) { return } // Drivers take strict precedence over executors val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE)) val numWorkersAlive = shuffledAliveWorkers.size var curPos = 0 for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers // We assign workers to each waiting driver in a round-robin fashion. For each driver, we // start from the last worker that was assigned a driver, and continue onwards until we have // explored all alive workers. var launched = false var numWorkersVisited = 0 while (numWorkersVisited < numWorkersAlive && !launched) { val worker = shuffledAliveWorkers(curPos) numWorkersVisited += 1 if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) { launchDriver(worker, driver) waitingDrivers -= driver launched = true } curPos = (curPos + 1) % numWorkersAlive } } startExecutorsOnWorkers() }
以上是关于看看spark的源码的主要内容,如果未能解决你的问题,请参考以下文章
在这个 spark 代码片段中 ordering.by 是啥意思?
可能是全网最详细的 Spark Sql Aggregate 源码剖析
spark关于join后有重复列的问题(org.apache.spark.sql.AnalysisException: Reference '*' is ambiguous)(代码片段