Spark源码编译

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark源码编译相关的知识,希望对你有一定的参考价值。

 

前言

   Spark可以通过SBT和Maven两种方式进行编译,再通过make-distribution.sh脚本生成部署包。  

   SBT编译需要安装git工具,而Maven安装则需要maven工具,两种方式均需要在联网 下进行。

    尽管maven是Spark官网推荐的编译方式,但是sbt的编译速度更胜一筹。因此,对于spark的开发者来说,sbt编译可能是更好的选择。由于sbt编译也是基于maven的POM文件,因此sbt的编译参数与maven的编译参数是一致的。

 

心得

   有时间,自己一定要动手编译源码,想要成为高手和大数据领域大牛,前面的苦,是必定要吃的。

   无论是编译spark源码,还是hadoop源码。新手初次编译,一路会碰到很多问题,也许会花上个一天甚至几天,这个是正常。把心态端正就是!有错误,更好,解决错误,是最好锻炼和提升能力的。

       更不要小看它们,能碰到是幸运,能弄懂和深入研究,之所以然,是福气。

 

 

技术分享

主流是这3大版本,其实,是有9大版本。

CDH的CM是要花钱的,当然它的预编译包,是免费的。

技术分享

 

以下是从官网下载:

技术分享

 

以下是Github下载(仅source code)

技术分享

 

技术分享

 

http://archive-primary.cloudera.com/cdh5/cdh/5/

技术分享

 

技术分享

http://zh.hortonworks.com/products/

 

 

********************************************************************************

好的,那我这里就以,Githud为例。

         准备Linux系统环境(如CentOS6.5)

        

思路流程:

  第一大步:在线安装git

  第二大步:创建一个目录来克隆spark源代码(mkdir -p /root/projects/opensource)

  第三大步:切换分支

  第四大步:安装jdk1.7+

  第三大步:

  第四大步:

 

 

当然,可以参考官网给出的文档,

 技术分享

http://spark.apache.org/docs/1.6.1/building-spark.html

 

 

 

第一大步:在线安装git(root 用户下)

  yum install git       (root用户)

  或者

  Sudo yum install git (普通用户)

 技术分享

[[email protected] ~]# yum install git

Loaded plugins: fastestmirror, refresh-packagekit, security

Loading mirror speeds from cached hostfile

 * base: mirrors.cug.edu.cn

 * extras: mirrors.cug.edu.cn

 * updates: mirrors.cug.edu.cn

Setting up Install Process

Resolving Dependencies

--> Running transaction check

---> Package git.x86_64 0:1.7.1-4.el6_7.1 will be installed

--> Processing Dependency: perl-Git = 1.7.1-4.el6_7.1 for package: git-1.7.1-4.el6_7.1.x86_64

--> Processing Dependency: perl(Git) for package: git-1.7.1-4.el6_7.1.x86_64

--> Processing Dependency: perl(Error) for package: git-1.7.1-4.el6_7.1.x86_64

--> Running transaction check

---> Package perl-Error.noarch 1:0.17015-4.el6 will be installed

---> Package perl-Git.noarch 0:1.7.1-4.el6_7.1 will be installed

--> Finished Dependency Resolution

 

Dependencies Resolved

 

===============================================================================================================================================================================================

 Package                                        Arch                                       Version                                              Repository                                Size

===============================================================================================================================================================================================

Installing:

 git                                            x86_64                                     1.7.1-4.el6_7.1                                      base                                     4.6 M

Installing for dependencies:

 perl-Error                                     noarch                                     1:0.17015-4.el6                                      base                                      29 k

 perl-Git                                       noarch                                     1.7.1-4.el6_7.1                                      base                                      28 k

 

Transaction Summary

===============================================================================================================================================================================================

Install       3 Package(s)

 

Total download size: 4.7 M

Installed size: 15 M

Is this ok [y/N]: y

Downloading Packages:

(1/3): git-1.7.1-4.el6_7.1.x86_64.rpm                                                                                                                                   | 4.6 MB     00:01    

(2/3): perl-Error-0.17015-4.el6.noarch.rpm                                                                                                                              |  29 kB     00:00    

(3/3): perl-Git-1.7.1-4.el6_7.1.noarch.rpm                                                                                                                              |  28 kB     00:00    

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Total                                                                                                                                                          683 kB/s | 4.7 MB     00:06    

warning: rpmts_HdrFromFdno: Header V3 RSA/SHA1 Signature, key ID c105b9de: NOKEY

Retrieving key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

Importing GPG key 0xC105B9DE:

 Userid : CentOS-6 Key (CentOS 6 Official Signing Key) <[email protected]>

 Package: centos-release-6-5.el6.centos.11.1.x86_64 (@anaconda-CentOS-201311272149.x86_64/6.5)

 From   : /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

Is this ok [y/N]: y

Running rpm_check_debug

Running Transaction Test

Transaction Test Succeeded

Running Transaction

  Installing : 1:perl-Error-0.17015-4.el6.noarch                                                                                                                                           1/3

  Installing : git-1.7.1-4.el6_7.1.x86_64                                                                                                                                                  2/3

  Installing : perl-Git-1.7.1-4.el6_7.1.noarch                                                                                                                                             3/3

  Verifying  : perl-Git-1.7.1-4.el6_7.1.noarch                                                                                                                                             1/3

  Verifying  : 1:perl-Error-0.17015-4.el6.noarch                                                                                                                                           2/3

  Verifying  : git-1.7.1-4.el6_7.1.x86_64                                                                                                                                                  3/3

 

Installed:

  git.x86_64 0:1.7.1-4.el6_7.1                                                                                                                                                                

 

Dependency Installed:

  perl-Error.noarch 1:0.17015-4.el6                                                              perl-Git.noarch 0:1.7.1-4.el6_7.1                                                            

 

Complete!

[[email protected] ~]#

 

 

 

第二大步:创建一个目录克隆spark源代码

  mkdir -p /root/projects/opensource

  cd /root/projects/opensource

  git clone https://github.com/apache/spark.git

技术分享

[[email protected] ~]# pwd

/root

[[email protected] ~]# mkdir -p /root/projects/opensource

[[email protected] ~]# cd projects/opensource/

[[email protected] opensource]# pwd

/root/projects/opensource

[[email protected] opensource]# ls

[[email protected] opensource]#

 

 技术分享

[[email protected] opensource]# pwd

/root/projects/opensource

[[email protected] opensource]# git clone https://github.com/apache/spark.git

Initialized empty Git repository in /root/projects/opensource/spark/.git/

remote: Counting objects: 403059, done.

remote: Compressing objects: 100% (13/13), done.

remote: Total 403059 (delta 4), reused 1 (delta 1), pack-reused 403045

Receiving objects: 100% (403059/403059), 182.79 MiB | 896 KiB/s, done.

Resolving deltas: 100% (157557/157557), done.

[[email protected] opensource]# ls

spark

[[email protected] opensource]# cd spark/

[[email protected] spark]#

 

其实就是,对应着,如下网页界面

技术分享

技术分享

技术分享

技术分享

[[email protected] spark]# pwd

/root/projects/opensource/spark

[[email protected] spark]# ll

total 280

-rw-r--r--.  1 root root  1804 Sep  2 03:53 appveyor.yml

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 assembly

drwxr-xr-x.  2 root root  4096 Sep  2 03:53 bin

drwxr-xr-x.  2 root root  4096 Sep  2 03:53 build

drwxr-xr-x.  8 root root  4096 Sep  2 03:53 common

drwxr-xr-x.  2 root root  4096 Sep  2 03:53 conf

-rw-r--r--.  1 root root   988 Sep  2 03:53 CONTRIBUTING.md

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 core

drwxr-xr-x.  5 root root  4096 Sep  2 03:53 data

drwxr-xr-x.  6 root root  4096 Sep  2 03:53 dev

drwxr-xr-x.  9 root root  4096 Sep  2 03:53 docs

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 examples

drwxr-xr-x. 15 root root  4096 Sep  2 03:53 external

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 graphx

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 launcher

-rw-r--r--.  1 root root 17811 Sep  2 03:53 LICENSE

drwxr-xr-x.  2 root root  4096 Sep  2 03:53 licenses

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 mesos

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 mllib

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 mllib-local

-rw-r--r--.  1 root root 24749 Sep  2 03:53 NOTICE

-rw-r--r--.  1 root root 97324 Sep  2 03:53 pom.xml

drwxr-xr-x.  2 root root  4096 Sep  2 03:53 project

drwxr-xr-x.  6 root root  4096 Sep  2 03:53 python

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 R

-rw-r--r--.  1 root root  3828 Sep  2 03:53 README.md

drwxr-xr-x.  5 root root  4096 Sep  2 03:53 repl

drwxr-xr-x.  2 root root  4096 Sep  2 03:53 sbin

-rw-r--r--.  1 root root 16952 Sep  2 03:53 scalastyle-config.xml

drwxr-xr-x.  6 root root  4096 Sep  2 03:53 sql

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 streaming

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 tools

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 yarn

[[email protected] spark]#

 

第三大步:切换分支

  git checkout v1.6.1 //在spark目录下执行

 技术分享

技术分享

[[email protected] spark]# pwd

/root/projects/opensource/spark

[[email protected] spark]# git branch -a

* master

  remotes/origin/HEAD -> origin/master

  remotes/origin/branch-0.5

  remotes/origin/branch-0.6

  remotes/origin/branch-0.7

  remotes/origin/branch-0.8

  remotes/origin/branch-0.9

  remotes/origin/branch-1.0

  remotes/origin/branch-1.0-jdbc

  remotes/origin/branch-1.1

  remotes/origin/branch-1.2

  remotes/origin/branch-1.3

  remotes/origin/branch-1.4

  remotes/origin/branch-1.5

  remotes/origin/branch-1.6

  remotes/origin/branch-2.0

  remotes/origin/master

[[email protected] spark]# git checkout v1.6.1

Note: checking out ‘v1.6.1‘.

 

You are in ‘detached HEAD‘ state. You can look around, make experimental

changes and commit them, and you can discard any commits you make in this

state without impacting any branches by performing another checkout.

 

If you want to create a new branch to retain commits you create, you may

do so (now or later) by using -b with the checkout command again. Example:

 

  git checkout -b new_branch_name

 

HEAD is now at 15de51c... Preparing Spark release v1.6.1-rc1

[[email protected] spark]#

 

 

那么,就有了。make-distribution.sh

技术分享

[[email protected] spark]# pwd

/root/projects/opensource/spark

[[email protected] spark]# ll

total 1636

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 assembly

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 bagel

drwxr-xr-x.  2 root root    4096 Sep  2 03:57 bin

drwxr-xr-x.  2 root root    4096 Sep  2 03:57 build

-rw-r--r--.  1 root root 1343562 Sep  2 03:57 CHANGES.txt

drwxr-xr-x.  2 root root    4096 Sep  2 03:57 conf

-rw-r--r--.  1 root root     988 Sep  2 03:53 CONTRIBUTING.md

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 core

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 data

drwxr-xr-x.  7 root root    4096 Sep  2 03:57 dev

drwxr-xr-x.  4 root root    4096 Sep  2 03:57 docker

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 docker-integration-tests

drwxr-xr-x.  9 root root    4096 Sep  2 03:57 docs

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 ec2

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 examples

drwxr-xr-x. 11 root root    4096 Sep  2 03:57 external

drwxr-xr-x.  6 root root    4096 Sep  2 03:57 extras

drwxr-xr-x.  4 root root    4096 Sep  2 03:57 graphx

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 launcher

-rw-r--r--.  1 root root   17352 Sep  2 03:57 LICENSE

drwxr-xr-x.  2 root root    4096 Sep  2 03:57 licenses

-rwxr-xr-x.  1 root root    8557 Sep  2 03:57 make-distribution.sh

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 mllib

drwxr-xr-x.  5 root root    4096 Sep  2 03:57 network

-rw-r--r--.  1 root root   23529 Sep  2 03:57 NOTICE

-rw-r--r--.  1 root root   91106 Sep  2 03:57 pom.xml

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 project

-rw-r--r--.  1 root root   13991 Sep  2 03:57 pylintrc

drwxr-xr-x.  6 root root    4096 Sep  2 03:57 python

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 R

-rw-r--r--.  1 root root    3359 Sep  2 03:57 README.md

drwxr-xr-x.  5 root root    4096 Sep  2 03:57 repl

drwxr-xr-x.  2 root root    4096 Sep  2 03:57 sbin

drwxr-xr-x.  2 root root    4096 Sep  2 03:57 sbt

-rw-r--r--.  1 root root   13191 Sep  2 03:57 scalastyle-config.xml

drwxr-xr-x.  6 root root    4096 Sep  2 03:57 sql

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 streaming

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 tags

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 tools

-rw-r--r--.  1 root root     848 Sep  2 03:57 tox.ini

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 unsafe

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 yarn

[[email protected] spark]#

其实啊,对应下面的这个界面

技术分享

 

 

按照http://spark.apache.org/docs/1.6.1/building-spark.html 这个步骤来。

● 安装jdk7+

 

 

 

 

 

 

  

 

以上是关于Spark源码编译的主要内容,如果未能解决你的问题,请参考以下文章

Spark源码打包编译的过程

Spark源码编译

源码编译搭建Spark3.x环境

1Spark 2.1 源码编译支持CDH

独一无二 hortonworks spark 源码编译教程

Spark入门教程Spark2.2源码编译及安装配置