如何在CDH 5上运行Spark应用程序

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何在CDH 5上运行Spark应用程序相关的知识,希望对你有一定的参考价值。

参考技术A 几个基本概念: (1)job:包含多个task组成的并行计算,往往由action催生。 (2)stage:job的调度单位。 (3)task:被送到某个executor上的工作单元。 (4)taskSet:一组关联的,相互之间没有shuffle依赖关系的任务组成的任务集。

0031-如何在CDH启用Kerberos的情况下安装及使用Sentry

温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。

1.文档编写目的


本文档主要讲述如何在启用Kerberos的CDH集群中安装配置及使用Sentry。

  • 内容概述

1.如何安装Sentry服务

2.Hive/Impala/Hue/HDFS服务如何与Sentry集成

3.Sentry测试

  • 测试环境

1.操作系统为CentOS6.5

2.CM和CDH版本为5.11.1

3.采用root用户操作

  • 前置条件

1.CDH集群运行正常

2.集群已启用Kerberos且正常使用

2.Sentry安装


1.在MySQL中创建sentry数据库

建表语句:

create database sentry default character set utf8;

CREATE USER ‘sentry‘@‘%‘ IDENTIFIED BY ‘password‘;

GRANT ALL PRIVILEGES ON sentry. * TO ‘sentry‘@‘%‘;

FLUSH PRIVILEGES;

命令行操作:

[[email protected] 527-hive-HIVEMETASTORE]# mysql -uroot -p
Enter password: 
...
mysql> create database sentry default character set utf8;
Query OK, 1 row affected (0.00 sec)
mysql> CREATE USER ‘sentry‘@‘%‘ IDENTIFIED BY ‘password‘;
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT ALL PRIVILEGES ON sentry. * TO ‘sentry‘@‘%‘;
Query OK, 0 rows affected (0.00 sec)
mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)
mysql> 

技术分享图片

2.进入Cloudera Manager控制台点击“添加服务”

技术分享图片

3.进入服务添加界面,选择Sentry服务,点击“继续”

技术分享图片

4.选择Sentry Server及Gateway的安装节点,点击“继续”

技术分享图片

5.输入Sentry服务的数据库信息,点击测试,测试通过,点击“继续”

技术分享图片

6.等待服务安装成功,点击“继续”

技术分享图片

7.点击“完成”,Sentry服务至此安装完成。

技术分享图片

3.Sentry配置

3.1Hive配置


1.配置Hive使用Sentry服务

技术分享图片

2.关闭Hive的用户模拟功能

技术分享图片

3.2Impala配置


配置Impala使用Sentry

技术分享图片

3.3Hue配置


配置Hue使用Sentry

技术分享图片

3.4HDFS配置


配置HDFS开启ACLs与Sentry权限同步

技术分享图片

完成以上配置后,回到Cloudera Manager主页,部署客户端配置并重启相关服务。

4.Sentry测试

4.1创建hive超级用户


使用hive用户登录Kerberos,操作如下

[[email protected] 196-hive-HIVEMETASTORE]# kinit -kt hive.keytab hive/[email protected]   
[[email protected] 196-hive-HIVEMETASTORE]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: hive/[email protected]

Valid starting     Expires            Service principal
09/07/17 02:26:04  09/08/17 02:26:04  krbtgt/[email protected]
        renew until 09/12/17 02:26:04
[[email protected] 196-hive-HIVEMETASTORE]# 

技术分享图片

1.使用beeline连接HiveServer2

[[email protected] 196-hive-HIVEMETASTORE]# beeline 
Beeline version 1.1.0-cdh5.12.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000/;principal=hive/[email protected]
scan complete in 3ms
Connecting to jdbc:hive2://localhost:10000/;principal=hive/[email protected]
Connected to: Apache Hive (version 1.1.0-cdh5.12.1)
Driver: Hive JDBC (version 1.1.0-cdh5.12.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000/>

技术分享图片

2.创建admin角色

0: jdbc:hive2://localhost:10000/> create role admin;
...
INFO  : OK
No rows affected (0.37 seconds)
0: jdbc:hive2://localhost:10000/>

技术分享图片

3.为admin角色赋予管理员权限

0: jdbc:hive2://localhost:10000> grant all on server server1 to role admin;

...

INFO : OK

No rows affected (0.221 seconds)

0: jdbc:hive2://localhost:10000>

技术分享图片

4.将admin角色授权给hive用户组

0: jdbc:hive2://localhost:10000> grant role admin to group hive;

...

INFO : OK

No rows affected (0.162 seconds)

0: jdbc:hive2://localhost:10000>

技术分享图片

以上操作创建了一个admin角色:

admin : 具有管理员权限,可以读写所有数据库,并授权给hive组(对应操作系统的组)

4.2创建test表


使用hive用户登录Kerberos,通过beeline登录HiveServer2,创建test表,并插入测试数据

0: jdbc:hive2://localhost:10000> create tabletest (s1 string, s2 string) row format delimited fields terminated by ‘,‘;

...

INFO : OK

No rows affected (0.592 seconds)

0: jdbc:hive2://localhost:10000> insert into test values(‘a‘,‘b‘),(‘1‘,‘2‘);

...

INFO : OK

No rows affected (20.123 seconds)

0: jdbc:hive2://localhost:10000>

技术分享图片

4.3创建测试角色并将角色授权给用户组


创建两个角色:

read:只能读default库test表,并授权给fayson用户组

write:只能写default库test表,并授权给user_w用户组

注意:集群所有节点必须存在fayson和user_w用户,用户默认用户组与用户名一致,赋权是针对用户组而不是针对用户。

[[email protected] cdh-shell-master]# id fayson
uid=501(fayson) gid=501(fayson) groups=501(fayson)
[[email protected] cdh-shell-master]# useradd user_w
[[email protected] cdh-shell-master]# id user_w
uid=502(user_w) gid=502(user_w) groups=502(user_w)
[[email protected] cdh-shell-master]# 

技术分享图片

1.使用hive用户创建read和write角色,并授权read角色对test表的select权限,write角色对test表的insert权限

0: jdbc:hive2://localhost:10000> create role read;

...

INFO : OK

No rows affected (0.094 seconds)

0: jdbc:hive2://localhost:10000> grant select on table test torole read;

...

INFO : OK

No rows affected (0.1 seconds)

0: jdbc:hive2://localhost:10000> create role write;

...

INFO : OK

No rows affected (0.105 seconds)

0: jdbc:hive2://localhost:10000> grant insert on table test to role write;

...

INFO : OK

No rows affected (0.112 seconds)

0: jdbc:hive2://localhost:10000>

技术分享图片

技术分享图片

2.将read角色授权给fayson用户组,write角色授权给user_w用户组

0: jdbc:hive2://localhost:10000> grant role read to group fayson;
...
INFO  : OK
No rows affected (0.187 seconds)
0: jdbc:hive2://localhost:10000> grant role write to group user_w;
...
INFO  : OK
No rows affected (0.101 seconds)
0: jdbc:hive2://localhost:10000> 

技术分享图片

3.使用kadmin创建fayson和user_w用户

[[email protected] ~]# kadmin.local
Authenticating as principal hive/[email protected] with password.
kadmin.local:  addprinc [email protected]
WARNING: no policy specified for [email protected]; defaulting to no policy
Enter password for principal "[email protected]": 
Re-enter password for principal "[email protected]": 
Principal "[email protected]" created.
kadmin.local:  addprinc [email protected]
WARNING: no policy specified for [email protected]; defaulting to no policy
Enter password for principal "[email protected]": 
Re-enter password for principal "[email protected]": 
Principal "[email protected]" created.
kadmin.local:  

技术分享图片

4.4beeline验证


1.使用fayson用户登录Kerberos

[[email protected] ~]# kdestroy
[[email protected] ~]# kinit fayson
Password for [email protected]: 
[[email protected] ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: [email protected]

Valid starting     Expires            Service principal
09/07/17 02:48:35  09/08/17 02:48:35  krbtgt/[email protected]
        renew until 09/14/17 02:48:35
[[email protected] ~]# 

技术分享图片

通过beeline连接HiveServer2进行验证

[[email protected] ~]# beeline 
Beeline version 1.1.0-cdh5.12.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000/;principal=hive/[email protected]
...
0: jdbc:hive2://localhost:10000/> show tables;
...
INFO  : OK
+-----------+--+
| tab_name  |
+-----------+--+
| test      |
+-----------+--+
1 row selected (0.403 seconds)
0: jdbc:hive2://localhost:10000/> select * from test;
...
INFO  : OK
+----------+----------+--+
| test.s1  | test.s2  |
+----------+----------+--+
| a        | b        |
| 1        | 2        |
| 111      | 222      |
| a        | b        |
| 1        | 2        |
| 333      | 5555     |
| eeee     | dddd     |
+----------+----------+--+
7 rows selected (0.282 seconds)
0: jdbc:hive2://localhost:10000/> insert into test values("2", "222");
Error: Error while compiling statement: FAILED: SemanticException No valid privileges
 User fayson does not have privileges for QUERY
 The required privileges: Server=server1->Db=default->Table=test->action=insert; (state=42000,code=40000)
0: jdbc:hive2://localhost:10000/> 

技术分享图片

技术分享图片

技术分享图片

技术分享图片

执行Hive的MapReduce任务

0: jdbc:hive2://localhost:10000/> select count(*) from test;
...
INFO  : OK
+------+--+
| _c0  |
+------+--+
| 7    |
+------+--+
1 row selected (30.688 seconds)
0: jdbc:hive2://localhost:10000/> 

技术分享图片

2.使用user_w用户登录Kerberos

[[email protected] ~]# kdestroy
[[email protected] ~]# kinit user_w
Password for [email protected]: 
[[email protected] ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: [email protected]

Valid starting     Expires            Service principal
09/07/17 03:01:56  09/08/17 03:01:56  krbtgt/[email protected]
        renew until 09/14/17 03:01:56
[[email protected] ~]# 

技术分享图片

使用beeline登录HiveServer2验证

[[email protected] ~]# beeline 
Beeline version 1.1.0-cdh5.12.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000/;principal=hive/[email protected]
...
0: jdbc:hive2://localhost:10000/> show tables;
...
INFO  : OK
+-----------+--+
| tab_name  |
+-----------+--+
| test      |
+-----------+--+
1 row selected (0.343 seconds)
0: jdbc:hive2://localhost:10000/> select  * from test;
Error: Error while compiling statement: FAILED: SemanticException No valid privileges
 User user_w does not have privileges for QUERY
 The required privileges: Server=server1->Db=default->Table=test->Column=s1->action=select; (state=42000,code=40000)
0: jdbc:hive2://localhost:10000/> insert into test values("2", "333");
...
INFO  : OK
No rows affected (19.379 seconds)
0: jdbc:hive2://localhost:10000/> 

技术分享图片

技术分享图片

验证总结:

fayson用户所属组为fayson拥有test表读权限,只能对test表进行select和count操作不能进行insert操作;

user_w用户所属组为user_w拥有test表写权限,只能对test表进行insert操作不能进行select操作;

4.5HDFS验证


1.使用fayson用户登录Kerberos,进行如下操作

技术分享图片

使用HDFS命令对/user/hive/warehouse/test进行cat、ls、put等操作

[[email protected] ~]# hadoop fs -ls /user/hive/warehouse
ls: Permission denied: user=fayson, access=READ_EXECUTE, inode="/user/hive/warehouse":hive:hive:drwxrwx--x
[[email protected] ~]# hadoop fs -ls /user/hive/warehouse/test
Found 5 items
-rwxrwx--x+  3 hive hive          8 2017-09-05 12:52 /user/hive/warehouse/test/000000_0
-rwxrwx--x+  3 hive hive          8 2017-09-05 13:44 /user/hive/warehouse/test/000000_0_copy_1
-rwxrwx--x+  3 hive hive          8 2017-09-07 02:36 /user/hive/warehouse/test/000000_0_copy_2
-rwxrwx--x+  3 hive hive          6 2017-09-07 03:04 /user/hive/warehouse/test/000000_0_copy_3
-rwxrwx--x+  3 hive hive         19 2017-09-05 13:01 /user/hive/warehouse/test/test.txt
[[email protected] ~]# hadoop fs -cat /user/hive/warehouse/test/test.txt
333,5555
eeee,dddd
[[email protected] ~]# hadoop fs -rm /user/hive/warehouse/test/test.txt
rm: Failed to move to trash: hdfs://ip-172-31-6-148.fayson.com:8020/user/hive/warehouse/test/test.txt: Permission denied: user=fayson, access=WRITE, inode="/user/hive/warehouse/test":hive:hive:drwxrwx--x
[[email protected] ~]# hadoop fs -put a.txt /user/hive/warehouse/test/
put: Permission denied: user=fayson, access=WRITE, inode="/user/hive/warehouse/test":hive:hive:drwxrwx--x
[[email protected] ~]# 

2.使用user_w用户登录Kerberos,进行如下操作

[[email protected] ~]# kdestroy
[[email protected] ~]# kinit user_w
Password for [email protected]: 
[[email protected] ~]# hadoop fs -ls /user/hive/warehouse
ls: Permission denied: user=user_w, access=READ_EXECUTE, inode="/user/hive/warehouse":hive:hive:drwxrwx--x
[[email protected] ~]# hadoop fs -ls /user/hive/warehouse/test
ls: Permission denied: user=user_w, access=READ_EXECUTE, inode="/user/hive/warehouse/test":hive:hive:drwxrwx--x
[[email protected] ~]# hadoop fs -cat /user/hive/warehouse/test/test.txt
cat: Permission denied: user=user_w, access=READ, inode="/user/hive/warehouse/test/test.txt":hive:hive:-rwxrwx--x
[[email protected] ~]# hadoop fs -rm /user/hive/warehouse/test/test.txt
17/09/07 03:21:21 INFO fs.TrashPolicyDefault: Moved: ‘hdfs://ip-172-31-6-148.fayson.com:8020/user/hive/warehouse/test/test.txt‘ to trash at: hdfs://ip-172-31-6-148.fayson.com:8020/user/user_w/.Trash/Current/user/hive/warehouse/test/test.txt
[[email protected] ~]# hadoop fs -put a.txt /user/hive/warehouse/test/
[[email protected] ~]# 

技术分享图片

fayson用户所属组为fayson,拥有test表的读权限,可以对test表的数据目录(/user/hive/warehouse/test)浏览及查看目录下文件内容,不能删除/test/目录下文件,也不能向test目录put文件。

user_w用户所属组为user_w,拥有test表的write权限,可以对test表的数据目录put文件及删除数据文件操作,但不能浏览及查看目录下的文件内容。

说明Sentry实现了HDFS的ACL同步。

4.6Hue验证


1.使用Hue的管理员,添加Hue的测试用户fayson和user_w

技术分享图片

2.使用fayson用户登录Hue,验证read权限

可以查看test表数据

技术分享图片

可以进行Count操作

技术分享图片

不能Insert操作

技术分享图片

File Browser浏览

不能浏览父目录/user/hive/warehouse

技术分享图片

可以浏览test表的数据目录/user/hive/warehouse/test

技术分享图片

可以查看/user/hive/warehouse/test目录下文件内容

技术分享图片

不能修改/user/hive/warehouse/test目录下数据文件

技术分享图片

3.使用user_w用户登录Hue,验证write权限

技术分享图片

不可以查看test表

技术分享图片

不可以Count操作

技术分享图片

可以向test表插入数据

技术分享图片

FileBrowser操作

不可以浏览父目录/user/hive/warehouse

技术分享图片

不可以浏览test表的数据目录/user/hive/warehouse/test

技术分享图片

fayson和user_w用户均能通过hue界面看到test表,拥有read角色的fayson用户组能对test进行select和count操作,并且能通过File Browser浏览和查看test表的数据目录/user/hive/warehouse/test。拥有write角色的user_w用户组只能对test表进行insert操作,但不能通过File Browser浏览和查看test表的数据目录/user/hive/warehouse/test。说明Sentry在命令行的操作和授权在Hue中依旧有效。

4.7Impala验证


1.使用fayson用户登录Kerberos

[[email protected] ~]# kinit fayson
Password for [email protected]: 
[[email protected] ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: [email protected]

Valid starting     Expires            Service principal
09/07/17 06:36:05  09/08/17 06:36:05  krbtgt/[email protected]
        renew until 09/14/17 06:36:05
[[email protected] ~]# impala-shell 
Starting Impala Shell without Kerberos authentication
...
Connected to ip-172-31-9-33.fayson.com:21000
Server version: impalad version 2.9.0-cdh5.12.1 RELEASE (build 5131a031f4aa38c1e50c430373c55ca53e0517b9)
[ip-172-31-9-33.fayson.com:21000] > show tables;
Query: show tables
+------+
| name |
+------+
| test |
+------+
Fetched 1 row(s) in 0.02s
[ip-172-31-9-33.fayson.com:21000] > select * from test;
...
+--------+----------+
| s1     | s2       |
+--------+----------+
| testaa | testbbb  |
| 111    | 222      |
| 222    | 2323     |
| 2      | 333      |
| a      | b        |
| 1      | 2        |
| 1      | test     |
| 2      | fayson   |
| 3      | zhangsan |
| a      | b        |
| 1      | 2        |
+--------+----------+
Fetched 11 row(s) in 0.19s
[ip-172-31-9-33.fayson.com:21000] > select count(*) from test;
...
+----------+
| count(*) |
+----------+
| 11       |
+----------+
Fetched 1 row(s) in 0.14s
[ip-172-31-9-33.fayson.com:21000] > insert into test values(‘test44‘,‘test55‘);
Query: insert into test values(‘test44‘,‘test55‘)
Query submitted at: 2017-09-07 06:37:00 (Coordinator: http://ip-172-31-9-33.fayson.com:25000)
ERROR: AuthorizationException: User ‘[email protected]‘ does not have privileges to execute ‘INSERT‘ on: default.test

[ip-172-31-9-33.fayson.com:21000] > 

技术分享图片

技术分享图片

2.使用user_w用户登录Kerberos

技术分享图片

登录Impala-shell进行操作

[[email protected] ~]# impala-shell 
...
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.9.0-cdh5.12.1 (5131a03) built on Thu Aug 24 09:27:32 PDT 2017)

***********************************************************************************
[Not connected] > connect ip-172-31-9-33.fayson.com:21000;
Connected to ip-172-31-9-33.fayson.com:21000
Server version: impalad version 2.9.0-cdh5.12.1 RELEASE (build 5131a031f4aa38c1e50c430373c55ca53e0517b9)
[ip-172-31-9-33.fayson.com:21000] > show tables;
Query: show tables
+------+
| name |
+------+
| test |
+------+
Fetched 1 row(s) in 0.29s
[ip-172-31-9-33.fayson.com:21000] > select * from test;
Query: select * from test
Query submitted at: 2017-09-07 06:31:23 (Coordinator: http://ip-172-31-9-33.fayson.com:25000)
ERROR: AuthorizationException: User ‘[email protected]‘ does not have privileges to execute ‘SELECT‘ on: default.test

[ip-172-31-9-33.fayson.com:21000] > insert into test values(‘222‘,‘2323‘);
Query: insert into test values(‘222‘,‘2323‘)
Query submitted at: 2017-09-07 06:32:07 (Coordinator: http://ip-172-31-9-33.fayson.com:25000)
Query progress can be monitored at: http://ip-172-31-9-33.fayson.com:25000/query_plan?query_id=ec406e621c7534c7:6bcbbd5300000000
Modified 1 row(s) in 0.63s
[ip-172-31-9-33.fayson.com:21000] > 

技术分享图片

验证总结:

Impala与Sentry集成后可以使用Sentry来进行权限管理,拥有read角色的fayson用户组只能对test表进行select和count操作不能插入数据,拥有write角色的user_w

用户组只能对test表插入数据不能进行select和count操作。说明Sentry实现了Hive权限与Impala的同步。

醉酒鞭名马,少年多浮夸! 岭南浣溪沙,呕吐酒肆下!挚友不肯放,数据玩的花!
温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。

推荐关注Hadoop实操,第一时间,分享更多Hadoop干货,欢迎转发和分享。

技术分享图片
原创文章,欢迎转载,转载请注明:转载自微信公众号Hadoop实操

以上是关于如何在CDH 5上运行Spark应用程序的主要内容,如果未能解决你的问题,请参考以下文章

如何在CDH5上运行Spark应用

如何在CDH5上运行Spark应用

如何在CDH5上运行Spark应用

如何在CDH5上运行Spark应用

如何在 CDH 5.4.4 上从 Spark 查询 Hive

CDH开启kerberos后在第三方机器上部署Spark程序问题解决