Apache Ranger and AWS EMR Automated Installation Series : OpenLDAP + Open-Source Ranger
Posted Laurence Geng
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Apache Ranger and AWS EMR Automated Installation Series : OpenLDAP + Open-Source Ranger相关的知识,希望对你有一定的参考价值。
文章目录
- 1. OpenLDAP + Open-Source Ranger Solution Overview
- 2. Installation & Integration
- 3. Verification
- 4. Appendix
In previous 2 articles, we introduced emr-native ranger integration solution with OpenLDAP and Window AD, from this article, we turn to introduce open-source ranger integration. This article will discuss “OpenLDAP + Open-Source Ranger”. This article address is https://laurence.blog.csdn.net/article/details/128799548, for reprint please indicate the source.
1. OpenLDAP + Open-Source Ranger Solution Overview
1.1 Solution Architecture
In this solution, OpenLDAP plays authentication provider, all user accounts data store on it, Ranger plays authorization controller, it will sync accounts data from OpenLDAP so as to grant privileges against user accounts from OpenLDAP, meanwhile, emr cluster need install a series of ranger plugins, these plugins will check with ranger server to assure if current user has permission to perform an action. And emr cluster will also sync accounts data from OpenLDAP via SSSD so as a user can login nodes of emr cluster and submit jobs. As end users, they can SSH login nodes of emr cluster with her/his OpenLDAP account, and if Hue is available, they can also login Hue with this account.
1.2 Ranger in Detail
Let’s deep dive into ranger for more details, its architecture looks as following:
The installer will finish following jobs:
① Install mysql as Policy DB for Ranger;
② Install Solr as Audit Store for Ranger;
③ Install Ranger Admin;
④ Install Ranger UserSync;
⑤ Install HDFS Ranger Plugin;
⑥ Install Hive Ranger Plugin;
2. Installation & Integration
Generally, the installation & integration process can be divided into 3 stages: 1. Prerequisites -> 2. All-In-One Install -> 3. Create EMR Cluster, the following diagram illustrates the progress in detail:
At stage 1, we need do some preparatory works; At stage 2, we start to install and integrate, here are 2 options at this stage: one is all-in-one installation driven by a command-line based workflow, the other is step-by-step installation. For most cases, all-in-one installation is always the best choice, however, sometimes, your installation workflow may be interrupted by unforeseen errors, if you want to continue installing from last failed step, please try step-by-step installation. Or sometimes, you want to re-try a step with different argument values to find the right one, step-by-step is also better choice; At stage 3, we need create an emr cluster. If you already have one, skip this job. In most cases, we need install ranger on an existing cluster not a new cluster, for emr-native ranger, it is impossible to install on an existing cluster (because emr-native ranger plugins can only be installed when creating cluster), but open-source ranger does NOT have this problem, you can be free to install on an existing or new emr cluster.
There is a little bit overlapping on execution sequence between stage 2 and 3. At step 2.4, the installation progress will be pending, the installer will indicate users to create their own cluster and keep monitoring target cluster’s status, once the cluster is ready, the progress will resume and continue to perform rest actions.
As a design principle, the installer does NOT include any actions to create an emr cluster, you should always create your cluster by yourself, because an emr cluster in practice could have any unpredictable settings, i.e., application-specific (hdfs, yarn, etc.) configuration, step scripts, bootstrap scripts and so on, it is unadvised to couple ranger’s installation with emr cluster’s creation.
Notes:
-
The installer will treat local host as ranger server to install everything of Ranger, for non-ranger operations, i.e., installing OpenLDAP, it will initiate remote operations via SSH. So, you can just stay on ranger server to execute command lines, no need to switch among multiple hosts.
-
Although it is not required, we suggest you always use FQDN as host address, Both IP and hostname without domain name are not recommended.
2.1 Prerequisites
2.1.1 Create EC2 Instances as Ranger and OpenLDAP Server
First, we need prepare 2 EC2 instances, one as the server of Ranger, the other as the server of OpenLDAP. When creating instances, please select Amazon Linux 2
image and guarantee network connections among instances and the cluster to be created are reachable.
As a best practice, it’s recommended to add ranger server into ElasticMapReduce-master
security group, because Ranger is very close to emr cluster, it can be regarded as a non-emr-build-in master service. For OpenLDAP, we have to make sure its ports 389 is reachable from ranger and all nodes of emr cluster to be created, or to be simple, you also add OpenLDAP into ElasticMapReduce-master
security group.
2.1.2 Download Installer
After EC2 instances are ready, pick the ranger server, login via ssh, run following commands to download installer package:
sudo yum -y install git
git clone https://github.com/bluishglc/ranger-emr-cli-installer.git
2.1.3 Upload SSH Key File
As mentioned before, the installer is based on local host (ranger server), to perform remote installing actions on OpenLDAP or emr cluster, SSH private key is required, so we should upload it to ranger server, and make a note of the file path, it will be the value of variable SSH_KEY
.
2.1.4 Export Environment-Specific Variables
During installing, following environment-specific arguments will be passed more than once, it’s recommended to export them first, then all command lines just refer these variables instead of literals.
export REGION='TO_BE_REPLACED'
export ACCESS_KEY_ID='TO_BE_REPLACED'
export SECRET_ACCESS_KEY='TO_BE_REPLACED'
export SSH_KEY='TO_BE_REPLACED'
export OPENLDAP_HOST='TO_BE_REPLACED'
The following is comments of above variables:
- REGION: Aws Region, i.e., cn-north-1, us-east-1 and so on.
- ACCESS_KEY_ID: Aws access key id of your IAM account. Be sure your account has enough privileges, it’s better having admin permissions.
- SECRET_ACCESS_KEY: Aws secret access key of your IAM account.
- SSH_KEY: Ssh private key file path on local host you just uploaded
- OPENLDAP_HOST: FQDN of OpenLDAP server
Please carefully replace above variables’ value according to your environment, and remember to use FQDN as hostname, i.e., OPENLDAP_HOST. The following is a copy of example:
export REGION='cn-north-1'
export ACCESS_KEY_ID='<change-to-your-aws-access-key-id>'
export SECRET_ACCESS_KEY='<change-to-your-aws-secret-access-key>'
export SSH_KEY='/home/ec2-user/key.pem'
export OPENLDAP_HOST='ip-10-0-14-0.cn-north-1.compute.internal'
2.2 All-In-One Installation
2.2.1 Quick Start
Now, let’s start an all-in-one installation, execute this command line:
sudo sh ./ranger-emr-cli-installer/bin/setup.sh install \\
--region "$REGION" \\
--access-key-id "$ACCESS_KEY_ID" \\
--secret-access-key "$SECRET_ACCESS_KEY" \\
--ssh-key "$SSH_KEY" \\
--solution 'open-source' \\
--auth-provider 'openldap' \\
--openldap-host "$OPENLDAP_HOST" \\
--openldap-base-dn 'dc=example,dc=com' \\
--openldap-root-cn 'admin' \\
--openldap-root-password 'Admin1234!' \\
--openldap-user-dn-pattern 'uid=0,ou=users,dc=example,dc=com' \\
--openldap-group-search-filter '(member=uid=0,ou=users,dc=example,dc=com)' \\
--openldap-user-object-class 'inetOrgPerson' \\
--example-users 'example-user-1,example-user-2' \\
--ranger-plugins 'open-source-hdfs,open-source-hive'
For parameters specification of above command line, please refer to appendix. If everything goes well, the command line will execute step from 2.1 to 2.3 in workflow diagram, this may spend 10 minutes or more depending on the bandwidth of your network, then it will suspend and indicate user to enter emr cluster id. If target cluster is existing, we can fill its id immediately, if not, we should switch to emr web console to create it. then, the command line asks users to confirm if let Hue integrate with LDAP or not. if yes, when cluster ready, the installer will update emr configuration with Hue specific settings (this action will overwrite emr existing configuration).
Fill above 2 items, enter “y” to confirm all inputs, the installation process will resume and if target emr cluster is not ready yet, the command line will keep monitoring until it goes into “WAITING” status. The following is a snapshot for this moment of the command line:
When cluster is ready (status is “WAITING”), the command line will continue to execute from steps 2.5 to 2.8 of workflow, and finally end with an “ALL DONE!!” message.
2.2.2 Customization
Now, all-in-one installation is done, next, we introduce more about customization. Generally, this installer follows the principle of “Convention over Configuration”, most parameters are preset by default values, an equivalent version with full parameter list of above command line is as following:
sudo sh ./ranger-emr-cli-installer/bin/setup.sh install \\
--region "$REGION" \\
--access-key-id "$ACCESS_KEY_ID" \\
--secret-access-key "$SECRET_ACCESS_KEY" \\
--ssh-key "$SSH_KEY" \\
--solution 'open-source' \\
--auth-provider 'openldap' \\
--openldap-host "$OPENLDAP_HOST" \\
--openldap-base-dn 'dc=example,dc=com' \\
--openldap-root-cn 'admin' \\
--openldap-root-password 'Admin1234!' \\
--openldap-user-dn-pattern 'uid=0,ou=users,dc=example,dc=com' \\
--openldap-group-search-filter '(member=uid=0,ou=users,dc=example,dc=com)' \\
--openldap-user-object-class 'inetOrgPerson' \\
--example-users 'example-user-1,example-user-2' \\
--ranger-plugins 'open-source-hdfs,open-source-hive' \\
--java-home '/usr/lib/jvm/java' \\
--skip-install-mysql 'false' \\
--skip-install-solr 'false' \\
--skip-install-openldap 'false' \\
--skip-configure-hue 'false' \\
--ranger-host $(hostname -f) \\
--ranger-version '2.1.0' \\
--mysql-host $(hostname -f) \\
--mysql-root-password 'Admin1234!' \\
--mysql-ranger-db-user-password 'Admin1234!' \\
--solr-host $(hostname -f) \\
--ranger-bind-dn 'cn=ranger,ou=services,dc=example,dc=com' \\
--ranger-bind-password 'Admin1234!' \\
--hue-bind-dn 'cn=hue,ou=services,dc=example,dc=com' \\
--hue-bind-password 'Admin1234!' \\
--sssd-bind-dn 'cn=sssd,ou=services,dc=example,dc=com' \\
--sssd-bind-password 'Admin1234!' \\
--restart-interval 30
The full-parameters version gives us a complete perspective of all custom options. In following scenarios, you may change some options’ value:
-
If you want to change default organization name
dc=example,dc=com
or default passwordAdmin1234!
, please run full-parameters version, and replace them with your own values. -
If you need integrate with external facilities, i.e., a centralized OpenLDAP or an existing MySQL, Solr, please add corresponding
--skip-xxx-xxx
options and set ittrue
. -
If you have other pre-defined bind dn for hue, ranger and sssd, please add corresponding
--xxx-bind-dn
and--xxx-bind-password
options to set them. Note that the bind dn for hue, ranger and sssd will be created automatically when installing OpenLDAP, but they are FIXED with naming patterncn=hue|ranger|sssd,ou=services,<your-base-dn>
not the given value of “–xxx-bind-dn” option, so if you assign other dn with “–xxx-bind-dn” option, you MUST create this dn by yourself in advance. The reason this install does NOT create the dn assigned by “–xxx-bind-dn” option is that a dn acutally is a tree path, to create it, we must create all nodes in the path, it is not cost-effective to implement such small but complicated function.
2.3 Step-By-Step Installation
As an alternative, you can also select step-by-step installation instead of all-in-one installation. we give the command line of each step, as for comments for each parameter, please refer to appendix.
2.3.1 Init EC2
This step will finish some fundamental jobs, i.e., install aws cli, jdk, and so on.
sudo sh ./ranger-emr-cli-installer/bin/setup.sh init-ec2 \\
--region "$REGION" \\
--access-key-id "$ACCESS_KEY_ID" \\
--secret-access-key "$SECRET_ACCESS_KEY"
2.3.2 Install OpenLDAP
This step will install OpenLDAP on given OpenLDAP host, as mentioned above, although this action is performed on OpenLDAP server, you DON’T need to login OpenLDAP server, just run the command line on local host (the ranger server).
sudo sh ./ranger-emr-cli-installer/bin/setup.sh install-openldap \\
--region "$REGION" \\
--access-key-id "$ACCESS_KEY_ID" \\
--secret-access-key "$SECRET_ACCESS_KEY" \\
--ssh-key "$SSH_KEY" \\
--solution 'open-source' \\
--auth-provider 'openldap' \\
--openldap-host "$OPENLDAP_HOST" \\
--openldap-base-dn 'dc=example,dc=com' \\
--openldap-root-cn 'admin' \\
--openldap-root-password 'Admin1234!'
2.3.3 Install Ranger
This step will install all server-side components of Ranger, including MySQL, Solr, Ranger Admin and Ranger UserSync.
sudo sh ./ranger-emr-cli-installer/bin/setup.sh install-ranger \\
--region "$REGION" \\
--access-key-id "$ACCESS_KEY_ID" \\
--secret-access-key "$SECRET_ACCESS_KEY" \\
--solution 'open-source' \\
--auth-provider 'openldap' \\
--openldap-host "$OPENLDAP_HOST" \\
--openldap-base-dn 'dc=example,dc=com' \\
--ranger-bind-dn 'cn=ranger,ou=services,dc=example,dc=com' \\
--ranger-bind-password 'Admin1234!' \\
--openldap-user-dn-pattern 'uid=0,ou=users,dc=example,dc=com' \\
--openldap-group-search-filter '(member=uid=0,ou=users,dc=example,dc=com)' \\
--openldap-user-object-class 'inetOrgPerson'
2.3.4 Create EMR Cluster
For step-by-step installation, there is no interactive process for creating emr cluster, so just feel free to create cluster on emr web console. but we have to wait for the cluster is completely ready (in “WAITING” status), then export following environment-specific variables:
export EMR_CLUSTER_ID='TO_BE_REPLACED'
The following is a copy of example:
export EMR_CLUSTER_ID='j-2S04VJZ5YQHZ4'
2.3.5 Install Ranger Plugins
This step will install hdfs and hive plugins on ranger server side and agent side (EMR nodes). This is different from emr-native ranger solution, for emr-native ranger, EMR will install agent sides on each node automatically, for open-source ranger, we have to do this job by ourselves via this installer.
sudo sh ./ranger-emr-cli-installer/bin/setup.sh install-ranger-plugins \\
--region "$REGION" \\
--ssh-key "$SSH_KEY" \\
--solution 'open-source' \\
--auth-provider 'openldap' \\
--ranger-plugins 'open-source-hdfs,open-source-hive' \\
--emr-cluster-id "$EMR_CLUSTER_ID"
2.3.6 Install SSSD
This step will install and config SSSD on each node of emr cluster. The same to installing OpenLDAP, we should still keep in local host to run the command line, it will perform on remote nodes via SSH.
sudo ./ranger-emr-cli-installer/bin/setup.sh install-sssd \\
--region "$REGION" \\
--ssh-key "$SSH_KEY" \\
--openldap-host "$OPENLDAP_HOST" \\
--openldap-base-dn 'dc=example,dc=com' \\
--sssd-bind-dn 'cn=sssd,ou=services,dc=example,dc=com' \\
--sssd-bind-password 'Admin1234!' \\
--emr-cluster-id "$EMR_CLUSTER_ID"
2.3.7 Configure Hue
This step will update hue configuration of emr, as highlighted in all-in-one installation , if you have other customized emr configuration, please skip this step, but you can still manually merge generated json file for hue configuration by command line into your own json.
sudo sh ./ranger-emr-cli-installer/bin/setup.sh configure-hue \\
--region "$REGION" \\
--auth-provider 'openldap' \\
--openldap-host "$OPENLDAP_HOST" \\
--openldap-base-dn 'dc=example,dc=com' \\
--hue-bind-dn 'cn=hue,ou=services,dc=example,dc=com' \\
--hue-bind-password 'Admin1234!' \\
--openldap-user-object-class 'inetOrgPerson' \\
--emr-cluster-id "$EMR_CLUSTER_ID"
2.3.8 Create Example Users
This step will create 2 example users in order to facilitate following verification.
sudo sh ./ranger-emr-cli-installer/bin/setup.sh add-example-users \\
--region "$REGION" \\
--ssh-key "$SSH_KEY" \\
--solution 'open-source' \\
--auth-provider 'openldap' \\
--openldap-host "$OPENLDAP_HOST" \\
--openldap-base-dn 'dc=example,dc=com' \\
--openldap-root-cn 'admin' \\
--openldap-root-password 'Admin1234!' \\
--example-users 'example-user-1,example-user-2'
3. Verification
After installation & integration is completed, it’s time to check if ranger works or not. The verification jobs are divided into 2 parts which are against hdfs and hive. First, let us login OpenLDAP via a client, i.e., LdapAdmin or Apache Directory Studio, then check out all DN, it should look as following:
Next, open ranger web console, the address is: http://<YOUR-RANGER-HOST>:6080
, the default admin account/password is: admin/admin
. After login, we should open “Users/Groups/Roles” page first, check if example users on OpenLDAP are already synchronized to ranger as following:
And besides, login the master node of emr cluster, export cluster id, because subsequent command lines need this variable.
# run on master node of emr cluster
export EMR_CLUSTER_ID='TO_BE_REPLACED'
The following is a copy of example:
# run on master node of emr cluster
export EMR_CLUSTER_ID='j-2S04VJZ5YQHZ4'
3.1 HDFS Access Control Verification
Usually, there are a set of pre-defined policies for hdfs plugin after installation as following:
We do NOT configure any HDFS permissions for example-user-1
, but if we login Hue with the account example-user-1
, you will see it can browse most directories and files on HDFS, this is because most directories and files has a+w
permission. Please keep in mind that HDFS r/w/x
file mode attributes and ranger-based permissions always take effective at the same time.
To verify if HDFS plugin works, we select “blacklist” mode to test. First, let’s create a directory named /ranger-test
on hdfs, and set example-user-1 as its owner:
# run on master node of emr cluster
sudo -u hdfs hdfs dfs -mkdir /ranger-test
sudo -u hdfs hdfs dfs -chown example-user-1:example-group /ranger-test
sudo -u hdfs hdfs dfs -chmod 700 /ranger-test
Next, let’s add a deny-policy which disable example-user-1 read and write ranger-test:
Any policy changes on ranger web console will sync to agent side (emr cluster nodes) within 30 seconds, we can run following commands on master node to check if local policy file is updated:
# run on master node of emr cluster
for i in 1..10; do
printf "\\n%100s\\n\\n"|tr ' ' '='
sudo stat /etc/ranger/HDFS_$EMR_CLUSTER_ID/policycache/hdfs_HDFS_$EMR_CLUSTER_ID.json
sleep 3
done
Once local policy file is up to date, the deny policy become effective, then login Hue with OpenLDAP account “example-user-1” created by installer, open “File Browser”, click root directory “/”, then click “ranger-test” folder, we will get an error message: “Cannot access:/ranger-test”:
Even current user example-user-1 is the owner of this folder, it is still blocked by ranger hdfs plugin, this means hdfs access control is managed by ranger.
Finally, remember to REMOVE the “ranger-test” policy so as example-user-1 has full privileges to access this folder, because following hive verification will re-use this folder.
3.2 Hive Access Control Verification
Usually, there is a set of pre-defined policies for hive plugin after installation, to eliminate interference, keep verification simple, let’s REMOVE them first:
Any policy changes on ranger web console will sync to agent side (emr cluster nodes) within 30 seconds, we can run following commands on master node to check if local policy file is updated:
# run on master node of emr cluster
for i in 1..10; do
printf "\\n%100s\\n\\n"|tr ' ' '='
sudo stat /etc/ranger/HIVE_$EMR_CLUSTER_ID/policycache/hiveServer2_HIVE_$EMR_CLUSTER_ID.json
sleep 3
done
Once local policy file is up to date, removing-all-policies action become effective, then login Hue with OpenLDAP account “example-user-1” created by installer, open hive editor, enter following sql (remember to replace “ranger-test” with your own bucket) to create a test table (change ‘ranger-test’ to your own bucket name):
-- run in hue hive editor
create table ranger_test (
id bigint
)
row format delimited
stored as textfile location '/ranger-test';
then, run it and an error occurs:
It shows example-user-1 is blocked by database-related permissions, this proves hive plugin is working, then we go back to ranger, add a hive policy named “all - database, table, column” as following:
It grants example-user-1 all privileges on all databases, tables and columns, then check policy file again on master node with previous command line, once updated, go back to Hue, re-run that sql, it will go well as following:
To double check if example-user-1 has full read & write permissions on the table, we can run following sql:
insert into ranger_test(id) values(1);
insert into ranger_test(id) values(2);
insert into ranger_test(id) values(3);
select * from ranger_test;
The execution result is:
By now, hive access control verifications are passed.
4. Appendix
The following is parameter specification:
Parameter | Comment |
---|---|
–region | the aws region. |
–access-key-id | the aws access key id of your IAM account. |
–secret-access-key | the aws secret access key of your IAM account. |
–ssh-key | the ssh private key file path. |
–solution | the solution name, accepted values ‘open-source’ or ‘emr-native’. |
–auth-provider | the authentication provider, accepted values ‘ad’ or ‘openldap’. |
–openldap-host | the FQDN of openldap host. |
–openldap-base-dn | the base dn of openldap, for example: ‘dc=example,dc=com’, change it according to your env. |
–openldap-root-cn | the cn of root account, for example: ‘admin’, change it according to your env. |
–openldap-root-password | the password of root account, for example: ‘Admin1234!’, change it according to your env. |
–ranger-bind-dn | the bind dn for ranger, for example: ‘cn=ranger,ou=services,dc=example,dc=com’, this should be an existing dn on Windows AD / OpenLDAP, change it according to your env. |
–ranger-bind-password | the password of ranger bind dn, for example: ‘Admin1234!’, change it according to your env. |
–openldap-user-dn-pattern | the dn pattern for ranger to search users on OpenLDAP, for example: ‘uid=0,ou=users,dc=example,dc=com’, change it according to your env. |
–openldap-group-search-filter | the filter for ranger to search groups on OpenLDAP, for example: ‘(member=uid=0,ou=users,dc=example,dc=com)’, change it according to your env. |
–openldap-user-object-class | the user object class for ranger to search users, for example: ‘inetOrgPerson’, change it according to your env. |
–hue-bind-dn | the bind dn for hue, for example: ‘cn=hue,ou=services,dc=example,dc=com’, this should be an existing dn on Windows AD / OpenLDAP, change it according to your env. |
–hue-bind-password | the password of hue bind dn, for example: ‘Admin1234!’, change it according to your env. |
–example-users | the example users to be created on OpenLDAP & Kerberos so as to demo ranger’s feature, this parameter is optional, if omitted, no example users will be created. |
–ranger-bind-dn | the bind dn for ranger, for example: ‘cn=ranger,ou=services,dc=example,dc=com’, this should be an existing dn on Windows AD / OpenLDAP, change it according to your env. |
–ranger-bind-password | the password of bind dn, for example: ‘Admin1234!’, change it according to your env. |
–hue-bind-dn | the bind dn for hue, for example: ‘cn=hue,ou=services,dc=example,dc=com’, this should be an existing dn on Windows AD / OpenLDAP, change it according to your env. |
–hue-bind-password | the password of hue bind dn, for example: ‘Admin1234!’, change it according to your env. |
–sssd-bind-dn | the bind dn for sssd, for example: ‘cn=sssd,ou=services,dc=example,dc=com’, this should be an existing dn on Windows AD / OpenLDAP, change it according to your env. |
–sssd-bind-password | the password of sssd bind dn, for example: ‘Admin1234!’, change it according to your env. |
–ranger-plugins | the ranger plugins to be installed, comma separated for multiple values. for example: ‘open-source-hdfs,open-source-hive’, change it according to your env. |
–skip-configure-hue | skip to configure hue, accepted values ‘true’ or ‘false’, dafault value is ‘false’. |
–skip-migrate-kerberos-db | skip to migrate kerberos database, accepted values ‘true’ or ‘false’, dafault value is ‘false’. |
Related Reading:
以上是关于Apache Ranger and AWS EMR Automated Installation Series : OpenLDAP + Open-Source Ranger的主要内容,如果未能解决你的问题,请参考以下文章
Apache Ranger and AWS EMR Automated Installation Series : OpenLDAP + Open-Source Ranger
Apache Ranger and AWS EMR Automated Installation Series : OpenLDAP + Open-Source Ranger
Apache Ranger and AWS EMR Automated Installation Series : OpenLDAP + Open-Source Ranger
Apache Ranger and AWS EMR Automated Installation and Integration Series : Solutions Overview
Apache Ranger and AWS EMR Automated Installation Series : Windows AD + Open-Source Ranger
Apache Ranger and AWS EMR Automated Installation Series : Windows AD + Open-Source Ranger