Apache Ranger and AWS EMR Automated Installation Series : OpenLDAP + EMR-Native Ranger

Posted bluishglc

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Apache Ranger and AWS EMR Automated Installation Series : OpenLDAP + EMR-Native Ranger相关的知识,希望对你有一定的参考价值。

文章目录


In the first article of this series, we got a full picture of emr and ranger integration solutions. From now on, we start to introduce concrete solutions one by one. This article is against “Scenario 1: OpenLDAP + EMR-Native Ranger”, we will introduce the artitecture of soltuion, give detailed installation step descriptions and verify installed environment.

1. Solution Overview

1.1 Architecture


In this solution, OpenLDAP plays authentication provider, all user accounts data store on it, Ranger plays authorization controller, because we select emr-native ranger solution which strongly depends on Kerberos, so a Kerberos KDC is required. in this solution, we recommend choosing cluster dedicated KDC created by EMR instead of external KDC, this can help us save the job of installing Kerberos. If you have an existing KDC, this solution also support.

To unify user accounts data, OpenLDAP and Kerberos must be integrated together, there is a series of jobs to do, i.e., enable SASL/GSSAPI, map 2 systems’ accounts, enable pass-through authentication and so on. For Ranger, it will sync accounts data from OpenLDAP so as to grant privileges against user accounts from OpenLDAP, meanwhile, emr cluster need install a series of ranger plugins, these plugins will check with ranger server to assure if current user has permission to perform an action. And emr cluster will also sync accounts data from OpenLDAP via SSSD so as a user can login nodes of emr cluster and submit jobs.

1.2 Authentication in Detail

Let’s deep dive into authentication part. OpenLDAP and Kerberos are two mutually independent authentication mechanisms, how to integrate them together is the subject of authentication. There is another series of articles dedicatedly elaborate this topic, and this installer completely followed all operations of this series:

OpenLDAP and Kerberos Based Authentication Solution (1): Integrating Backend Database
OpenLDAP and Kerberos Based Authentication Solution (2): Synchronizing SSSD
OpenLDAP and Kerberos Based Authentication Solution (3): Deeply Integrating with SASL/GSSAPI

Generally, the installer will finish following jobs:

① Install OpenLDAP;
② Install SSSD on all nodes of emr cluster;
③ Migrate Kerberos backend database to OpenLDAP, save accounts data of the two systems into a single record;
④ Install & configure SASL/GSSAPI to enable Kerberos accounts login OpenLDAP;
⑤ Configure OpenLDAP to map Kerberos accounts to OpenLDAP accounts;
⑥ Enable saslauthd, unify OpenLDAP and Kerberos accounts password;
⑦ Configure SSH, enable users login with OpenLDAP account;
⑧ Configure SSH, enable users login with Kerberos account via GSSAPI;

1.3 Authorization in Detail

For authorization, Ranger is absolutely the leading role, if we deep dive into it, its architecture looks as following:

The installer will finish following jobs:

① Install mysql as Policy DB for Ranger;
② Install Solr as Audit Store for Ranger;
③ Install Ranger Admin;
④ Install Ranger UserSync;
⑤ Install EMRFS(S3) Ranger Plugin;
⑥ Install Spark Ranger Plugin;
⑦ Install Hive Ranger Plugin;
⑧ Install Trino Ranger Plugin (NOT available yet at the time writing)

2. Installation & Integration

Generally, the installation & integration process can be divided into 3 stages: 1. Prerequisites -> 2. All-In-One Install -> 3. Create EMR Cluster, the following diagram illustrates the progress in detail:

At stage 1, we need do some preparatory work; At stage 2, we start to install and integrate, here are 2 options at this stage: one is all-in-one installation driven by a command-line based workflow, the other is step-by-step installation. For most cases, all-in-one installation is always the best choice, however, sometimes, your installation workflow may be interrupted by unforeseen errors, if you want to continue installing from last failed step, please try step-by-step installation. Or sometimes, you want to re-try a step with different argument values to find the right one, step-by-step is also better choice; At stage 3, we need create an emr cluster by ourselves with output artifacts in stage 2, i.e., iam roles and emr security configuration.

As a design principle, the installer does NOT include any actions to create an emr cluster, you should always create your cluster by yourself, because an emr cluster in practice could have any unpredictable settings, i.e., application-specific (hdfs, yarn, etc.) configuration, step scripts, bootstrap scripts and so on, it is unadvised to couple ranger’s installation with emr cluster’s creation.

However, there is a little bit overlapping on execution sequence between stage 2 and 3. When creating an emr cluster basesd on emr-native ranger, it is required to provide a copy of security configuration and ranger-specific iam roles, they must be available before creating an emr cluster, and besides, during creating cluster, it also need interact with the ranger server (server address is assigned in security configuration); On the other hand, some operations in all-in-one installation need perform on all nodes of cluster or KDC, this requires an emr cluster must be ready. To solve this circular dependency, the installer output some artifacts first depended by emr cluster, then indicate users to create their own cluster with these artifacts, meanwhile, installation progress will be pending, and keep monitoring target cluster’s status, once it’s ready, installation progress will resume and continue to perform rest actions.

Notes:

  1. The installer will treat local host as ranger server to install everything of Ranger, for non-ranger operations, i.e., installing OpenLDAP or migrating Kerberos DB, it will initiate remote operations via SSH. So, you can just stay on ranger server to execute command lines, no need to switch among multiple hosts.

  2. For the sake of Kerberos, all host address must use FQDN, Both IP and hostname without domain name are unaccepted.

2.1 Prerequisites

2.1.1 Create EC2 Instances as Ranger and OpenLDAP Server

First, we need prepare 2 EC2 instances, one as the server of Ranger, the other as the server of OpenLDAP. When creating instances, please select Amazon Linux 2 image and guarantee network connections among instances and the cluster to be created are reachable.

As a best practice, it’s recommended to add ranger server into ElasticMapReduce-master security group, because Ranger is very close to emr cluster, it can be regarded as a non-emr-build-in master service. For OpenLDAP, we have to make sure its ports 389 is reachable from ranger and all nodes of emr cluster to be created, or to be simple, you also add OpenLDAP into ElasticMapReduce-master security group.

2.1.2 Download Installer

After EC2 instances are ready, pick the ranger server, login via ssh, run following commands to download installer package:

sudo yum -y install git
git clone https://github.com/bluishglc/ranger-emr-cli-installer.git

2.1.3 Upload SSH Key File

As mentioned before, the installer is based on local host (ranger server), to perform remote installing actions on OpenLDAP or emr cluster, SSH private key is required, so we should upload it to ranger server, and Make a note of the file path, it will be the value of variable SSH_KEY.

2.1.4 Export Environment-Specific Variables

During installing, following environment-specific arguments will be passed more than once, it’s recommended to export them first, then all command lines just refer these variables instead of literals.

export REGION='TO_BE_REPLACED'
export ACCESS_KEY_ID='TO_BE_REPLACED'
export SECRET_ACCESS_KEY='TO_BE_REPLACED'
export SSH_KEY='TO_BE_REPLACED'
export OPENLDAP_HOST='TO_BE_REPLACED'

The following is comments of above variables:

  • REGION: Aws Region, i.e., cn-north-1, us-east-1 and so on.
  • ACCESS_KEY_ID: Aws access key id of your IAM account. Be sure your account has enough privileges, it’s better having admin permissions.
  • SECRET_ACCESS_KEY: Aws secret access key of your IAM account.
  • SSH_KEY: Ssh private key file path on local host you just uploaded
  • OPENLDAP_HOST: FQDN of OpenLDAP server

Please carefully replace above variables’ value according to your environment, and remember to use FQDN as hostname, i.e., OPENLDAP_HOST. The following is a copy of example:

export REGION='cn-north-1'
export ACCESS_KEY_ID='<change-to-your-aws-access-key-id>'
export SECRET_ACCESS_KEY='<change-to-your-aws-secret-access-key>'
export SSH_KEY='/home/ec2-user/key.pem'
export OPENLDAP_HOST='ip-10-0-14-0.cn-north-1.compute.internal'

2.2 All-In-One Installation

2.2.1 Quick Start

Now, let’s start an all-in-one installation, execute this command line:

sudo sh ./ranger-emr-cli-installer/bin/setup.sh install \\
    --region "$REGION" \\
    --access-key-id "$ACCESS_KEY_ID" \\
    --secret-access-key "$SECRET_ACCESS_KEY" \\
    --ssh-key "$SSH_KEY" \\
    --solution 'emr-native' \\
    --auth-provider 'openldap' \\
    --openldap-host "$OPENLDAP_HOST" \\
    --openldap-base-dn 'dc=example,dc=com' \\
    --openldap-root-cn 'admin' \\
    --openldap-root-password 'Admin1234!' \\
    --openldap-user-dn-pattern 'uid=0,ou=users,dc=example,dc=com' \\
    --openldap-group-search-filter '(member=uid=0,ou=users,dc=example,dc=com)' \\
    --openldap-user-object-class 'inetOrgPerson' \\
    --example-users 'example-user-1,example-user-2' \\
    --ranger-plugins 'emr-native-emrfs,emr-native-spark,emr-native-hive'

For parameters specification of above command line, please refer to appendix. If everything goes well, the command line will execute step from 2.1 to 2.7 in workflow diagram, this may spend 10 minutes or more depending on the bandwidth of your network,then it will suspend and indicate user to create emr cluster with the 2 artifacts:

Ⓐ An ec2 instance profile named EMR_EC2_RangerRole
Ⓑ An emr security configuration named Ranger@<YOUR-RANGER-HOST-FQDN>

They are just created by command line in step 2.2 & 2.4, and you can find them from emr web console when creating cluster. The following is a snapshot of command line for this moment:
Next, we should switch to emr web console to create a cluster, be sure to select ec2 instance profile and security configuration prompted in command line console, and for Kerberos KDC, please also fill and Make a note of “realm” and “KDC admin password”, they will be used in command line soon. The following is a snapshot of emr web console for this moment:


Once the emr cluster starts to create, 5 cluster-related information items will be certain, they are:

① Cluster id: get from summary tab on web console.
② Kerberos realm: entered by you at “Authentication and encryption” section, see above snapshot. Note that for region us-east-1, default realm is EC2.INTERNAL; for other regions, default realm is COMPUTE.INTERNAL
③ Kerberos kdc admin password: entered by you at “Authentication and encryption” section, see above snapshot.
④ Kerberos kdc host: get from hardware tab on web console, usually is master node.
⑤ Confirm if let Hue integrate with LDAP or not. if yes, after cluster is ready, the installer will update emr configuration with Hue specific setting, be careful that this action will overwrite emr existing configuration.

Now, we need go back to command line terminal, enter “y” for cli prompt "Have you created the cluster? [y/n]: "(you don’t need wart for the cluster to become completely ready), then the command line will ask you to enter above 4 information items one by one, because they are required for next phase of installation, then confirm by enter “y” again, the installation process will resume and if assigned emr cluster is not ready yet, the command line will keep monitoring until it goes into “WAITING” status, the following is a snapshot for this moment of the command line:

When cluster is ready (status is “WAITING”), the command line will continue to execute from steps 2.9 to 2.13 of workflow, and finally end with an “ALL DONE!!” message.

2.2.2 Customization

Now, all-in-one installation is done, next, we introudce more about customization. Generally, this installer follows the principle of “Convention over Configuration”, most parameters are preset by default values, an equivalent version with full parameter list of above command line is as following:

sudo sh ./ranger-emr-cli-installer/bin/setup.sh install \\
    --region "$REGION" \\
    --access-key-id "$ACCESS_KEY_ID" \\
    --secret-access-key "$SECRET_ACCESS_KEY" \\
    --ssh-key "$SSH_KEY" \\
    --solution 'emr-native' \\
    --auth-provider 'openldap' \\
    --openldap-host "$OPENLDAP_HOST" \\
    --openldap-base-dn 'dc=example,dc=com' \\
    --openldap-root-cn 'admin' \\
    --openldap-root-password 'Admin1234!' \\
    --openldap-user-dn-pattern 'uid=0,ou=users,dc=example,dc=com' \\
    --openldap-group-search-filter '(member=uid=0,ou=users,dc=example,dc=com)' \\
    --openldap-user-object-class 'inetOrgPerson' \\
    --example-users 'example-user-1,example-user-2' \\
    --ranger-plugins 'emr-native-emrfs,emr-native-spark,emr-native-hive' \\
    --java-home '/usr/lib/jvm/java' \\
    --skip-install-mysql 'false' \\
    --skip-migrate-kerberos-db 'false' \\
    --skip-install-solr 'false' \\
    --skip-install-openldap 'false' \\
    --skip-configure-hue 'false' \\
    --ranger-host $(hostname -f) \\
    --ranger-version '2.1.0' \\
    --mysql-host $(hostname -f) \\
    --mysql-root-password 'Admin1234!' \\
    --mysql-ranger-db-user-password 'Admin1234!' \\
    --solr-host $(hostname -f) \\
    --ranger-bind-dn 'cn=ranger,ou=services,dc=example,dc=com' \\
    --ranger-bind-password 'Admin1234!' \\
    --hue-bind-dn 'cn=hue,ou=services,dc=example,dc=com' \\
    --hue-bind-password 'Admin1234!' \\
    --sssd-bind-dn 'cn=sssd,ou=services,dc=example,dc=com' \\
    --sssd-bind-password 'Admin1234!' \\
    --restart-interval 30

The full-parameters version give us a complete perspective of all custom options. In following scenarios, you may change some options’ value:

  1. If you want to change default organization name dc=example,dc=com or default password Admin1234!, please run full-parameters version, and replace them with your own values.

  2. If you need integrate with external facilities, i.e., a centralized OpenLDAP or an existing MySQL, Solr, please add corresponding --skip-xxx-xxx options and set it true.

  3. If you have other pre-defined bind dn for hue, ranger and sssd, please add corresponding --xxx-bind-dn and --xxx-bind-password options to set them. Note that the bind dn for hue, ranger and sssd will be created automatically when installing OpenLDAP, but they are FIXED with naming pattern cn=hue|ranger|sssd,ou=services,<your-base-dn> not the given value of “–xxx-bind-dn” option, so if you assign other dn with “–xxx-bind-dn” option, you MUST create this dn by yourself in advance. The reason this install does NOT create the dn assigned by “–xxx-bind-dn” option is that a dn acutally is a tree path, to create it, we must create all nodes in the path, it is not cost-effective to implement such small but complicated function.

  4. By default, all-in-one installation will migrate cluster Kerberos database to OpenLDAP so as to better accounts management, but if you run an external Kerberos KDC, please be sure if you really need migrate an external KDC’s database to OpenLDAP, if not, please add --skip-migrate-kerberos-db 'true' in command line to skip it.

2.3 Step-By-Step Installation

As an alternative, you can also select step-by-step installation instead of all-in-one installation. we give the command line of each step, as for comments for each parameter, please refer to appendix.

2.3.1 Init EC2

This step will finish some fundamental jobs, i.e., install aws cli, jdk, and so on.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh init-ec2 \\
    --region "$REGION" \\
    --access-key-id "$ACCESS_KEY_ID" \\
    --secret-access-key "$SECRET_ACCESS_KEY"

2.3.2 Create IAM Roles

This step will create 3 iam roles which are required for EMR.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh create-iam-roles \\
    --region "$REGION"

2.3.3 Create Ranger Secrets

This step will create SSL/TLS related keys, certificates and keystores for Ranger, because emr-native ranger requires SSL/TLS connections to server. These artifacts will upload to aws secrets manager and referred by emr security configuration.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh create-ranger-secrets \\
    --region "$REGION"

2.3.4 Create EMR Security Configuration

This step will create a copy of emr security configuration, the configuration includes Kerberos and Ranger related information, when creating cluster, emr will read them and get corresponding resources, i.e., secrets, and also interact with the ranger server which address is assigned in the security configuration.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh create-emr-security-configuration \\
    --region "$REGION" \\
    --solution 'emr-native' \\
    --auth-provider 'openldap'

2.3.5 Install OpenLDAP

This step will install OpenLDAP on given OpenLDAP host, as mentioned above, although this action is performed on OpenLDAP server, you DON’T need to login OpenLDAP server, just run the command line on local host (the ranger server).

sudo sh ./ranger-emr-cli-installer/bin/setup.sh install-openldap \\
    --region "$REGION" \\
    --access-key-id "$ACCESS_KEY_ID" \\
    --secret-access-key "$SECRET_ACCESS_KEY" \\
    --ssh-key "$SSH_KEY" \\
    --solution 'emr-native' \\
    --auth-provider 'openldap' \\
    --openldap-host "$OPENLDAP_HOST" \\
    --openldap-base-dn 'dc=example,dc=com' \\
    --openldap-root-cn 'admin' \\
    --openldap-root-password 'Admin1234!'

2.3.6 Install Ranger

This step will install all server-side components of Ranger, including MySQL, Solr, Ranger Admin and Ranger UserSync.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh install-ranger \\
    --region "$REGION" \\
    --access-key-id "$ACCESS_KEY_ID" \\
    --secret-access-key "$SECRET_ACCESS_KEY" \\
    --solution 'emr-native' \\
    --auth-provider 'openldap' \\
    --openldap-host "$OPENLDAP_HOST" \\
    --openldap-base-dn 'dc=example,dc=com' \\
    --ranger-bind-dn 'cn=ranger,ou=services,dc=example,dc=com' \\
    --ranger-bind-password 'Admin1234!' \\
    --openldap-user-dn-pattern 'uid=0,ou=users,dc=example,dc=com' \\
    --openldap-group-search-filter '(member=uid=0,ou=users,dc=example,dc=com)' \\
    --openldap-user-object-class 'inetOrgPerson'

2.3.7 Install Ranger Plugins

This step will install emrfs, spark and hive plugins from ranger server side. There is the other half job which install these plugins (actually they are EMR Secret Agent, EMR Record Server and so on) on agent side, however, it will be done automatically by emr when creating cluster.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh install-ranger-plugins \\
    --region "$REGION" \\
    --solution 'emr-native' \\
    --auth-provider 'openldap' \\
    --ranger-plugins 'emr-native-emrfs,emr-native-spark,emr-native-hive'

2.3.8 Create EMR Cluster

For step-by-step installation, there is no interactive process for creating emr cluster, so just feel free to create cluster on emr web console. but we have to wait for the cluster is completely ready (in “WAITING” status), then export following environment-specific variables:

export EMR_CLUSTER_ID='TO_BE_REPLACED'
export KERBEROS_REALM='TO_BE_REPLACED'
export KERBEROS_KDC_HOST='TO_BE_REPLACED'

The following is a copy of example:

export EMR_CLUSTER_ID='j-8SRQM6X4ZVT8'
export KERBEROS_REALM='COMPUTE.INTERNAL'
export KERBEROS_KDC_HOST='ip-10-0-3-104.cn-north-1.compute.internal'

2.3.9 Migrate Kerberos DB

The default database of Kerberos is file-based, and store on KDC, this step will migrate all principals’ data to OpenLDAP. Please pay more attention to this step, if you run an external KDC and it is NOT dedicated for your emr cluster, you may skip this step unless you are sure you need migrate an external KDC to your OpenLDAP.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh migrate-kerberos-db \\
    --region $REGION \\
    --ssh-key "$SSH_KEY" \\
    --kerberos-realm "$KERBEROS_REALM" \\
    --kerberos-kdc-host "$KERBEROS_KDC_HOST" \\
    --openldap-host "$OPENLDAP_HOST" \\
    --openldap-base-dn 'dc=example,dc=com' \\
    --openldap-root-cn 'admin' \\
    --openldap-root-password 'Admin1234!'

2.3.10 Enable SASL/GSSAPI

This step will enable SASL/GSSAPI, this is a key action of OpenLDAP and Kerberos integration, it will perform remote actions on OpenLDAP, Kerberos KDC and each node of emr cluster, as before, you just need run it on local host.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh enable-sasl-gssapi \\
    --region "$REGION" \\
    --ssh-key "$SSH_KEY" \\
    --kerberos-realm "$KERBEROS_REALM" \\
    --kerberos-kdc-host "$KERBEROS_KDC_HOST" \\
    --openldap-host "$OPENLDAP_HOST" \\
    --openldap-base-dn 'dc=example,dc=com' \\
    --openldap-root-cn 'admin' \\
    --openldap-root-password 'Admin1234!' \\
    --emr-cluster-id "$EMR_CLUSTER_ID"

2.3.11 Install SSSD

This step will install and config SSSD on each node of emr cluster. The same to installing OpenLDAP, we should still keep in local host to run the command line, it will perform on remote nodes via SSH.

sudo ./ranger-emr-cli-installer/bin/setup.sh install-sssd \\
    --region "$REGION" \\
    --ssh-key "$SSH_KEY" \\
    --openldap-host "$OPENLDAP_HOST" \\
    --openldap-base-dn 'dc=example,dc=com' \\
    --sssd-bind-dn 'cn=sssd,ou=services,dc=example,dc=com' \\
    --sssd-bind-password 'Admin1234!' \\
    --emr-cluster-id "$EMR_CLUSTER_ID"

2.3.12 Configure Hue

This step will update hue configuration of emr, as highlighted in all-in-one installation , if you have other customized emr configuration, please skip this step, but you can still manually merge generated json file for hue configuration by command line into your own json.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh configure-hue \\
    --region "$REGION" \\
    --auth-provider 'openldap' \\
    --openldap-host "$OPENLDAP_HOST" \\
    --openldap-base-dn 'dc=example,dc=com' \\
    --hue-bind-dn 'cn=hue,ou=services,dc=example,dc=com' \\
    --hue-bind-password 'Admin1234!' \\
    --openldap-user-object-class 'inetOrgPerson' \\
    --emr-cluster-id "$EMR_CLUSTER_ID"

2.3.13 Create Example Users

This step will create 2 example users in order to facilitate following verification.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh add-example-users \\
    --region "$REGION" \\
    --ssh-key "$SSH_KEY" \\
    --solution 'emr-native' \\
    --auth-provider 'openldap' \\
    --kerberos-kdc-host "$KERBEROS_KDC_HOST" \\
    --openldap-host "$OPENLDAP_HOST" \\
    --openldap-base-dn 'dc=example,dc=com' \\
    --openldap-root-cn 'admin' \\
    --openldap-root-password 'Admin1234!' \\
    --example-users 'example-user-1,example-user-2'

3. Verification

After installation & integration is completed, it’s time to check if ranger works or not. The verification jobs are divided into 3 parts which are against hive, emrfs(s3) and spark. First, let us login OpenLDAP via a client, i.e., LdapAdmin or Apache Directory Studio, then check out all DN, it should look as following:

Next, open ranger web console, the address is: https://<YOUR-RANGER-HOST>:6182, the default admin account/password is: admin/admin. After login, we should open “Users/Groups/Roles” page first, check if example users on OpenLDAP are already synchronized to ranger as following:

3.1 Hive Access Control Verification

Usually, there are a set of pre-defined policies for hive plugin after installation, to eliminate interference, keep verification simple, let’s REMOVE them first:


Any policy changes on ranger web console will sync to agent side (emr cluster nodes) within 30 seconds, we can run following commands on master node to check if local policy file is updated:

# run on master node of emr cluster
for i in 1..10; do
    printf "\\n%100s\\n\\n"|tr ' ' '='
    sudo stat /etc/hive/ranger_policy_cache/hiveServer2_hive.json
    sleep 3
done

Once local policy file is up to date, removing-all-policies action become effective, then login Hue with OpenLDAP account “example-user-1” created by installer, open hive editor, enter following sql (remember to replace “ranger-test” with your own bucket) to create a test table (change ‘ranger-test’ to your own bucket name):

-- run in hue hive editor
create table ranger_test (
  id bigint
)
row format delimited
stored as textfile location 's3://ranger-test/';

then, run it and an error occurs:


It shows example-user-1 is blocked by database-related permissions, this proves hive plugin is working, then we go back to ranger, add a hive policy named “all - database, table, column” as following:


It grants example-user-1 all privileges on all databases, tables and columns, then check policy file again on master node with previous command line, once updated, go back to Hue, re-run that sql, we will get another error at this time:


As shown, the sql is blocked when reading “s3://ranger-test”, actually, example-user-1 has no permissions to access any URL, including “s3://”. We need grant url-related permissions to this user, so go back to ranger again, add a hive policy named “all - url” as following:


It grants example-user-1 all privileges on any url, including “s3://”, then check policy file again, and switch to Hue, run that sql third time, it will go well as following:


At the end, to prepare for next EMRFS / Spark verification, we need insert some example data into the table, and also double check if example-user-1 has full read & write permissions on the table:

insert into ranger_test(id) values(1);
insert into ranger_test(id) values(2);
insert into ranger_test(id) values(3);
select * from ranger_test;

The execution result is:


By now, hive access control verifications are passed.

3.2 EMRFS (S3) Access Control Verification

Login Hue with account “example-user-1”, open scala editor, enter following spark codes:

# run in scala editor of hue
spark.read.csv("s3://ranger-test/").show;

This line of codes try to read files on S3, but it will run into following errors:

It shows example-user-1 has no permission on s3 bucket “ranger-test”. This proves emrfs plugin is working, it successfully blocked unauthorized s3 access. Let’s login ranger, add an emrfs policy named “all - ranger-test” as following:

It will grant example-user-1 all privileges on “ranger-test” bucket. Similar to checking hive policy file, we can also run following command to check if emrfs policy file is updated:

# run on master node of emr cluster
for i in 1..10; do
    printf "\\n%100s\\n\\n"|tr ' ' '='
    sudo stat /emr/secretagent/ranger_policy_cache/emrS3RangerPlugin_emrfs.json
    sleep 3
done

After updated, go back to Hue, re-run previous spark codes, it will succeed as following:


By now, emrfs access control verifications are passed.

3.3 Spark Access Control Verification

Login Hue with account “example-user-1”, open scala editor, enter following spark codes:

# run in scala editor of hue
spark.sql("select * from ranger_test").show

This line of codes try to ranger_test table via spark sql, but it will run into following errors:

It shows current user has no permission on default database. This proves spark plugin is working, it successfully blocked unauthorized database/tables access.

Let’s login ranger, add a spark policy named “all - database, table, column” as following:

It will grant example-user-1 all privileges on all databases/tables/columns. Similar to checking hive policy file, we can also run following command to check if spark policy file is updated:

# run on master node of emr cluster
for i in 1..10; do
    printf "\\n%100s\\n\\n"|tr ' ' '='
    sudo stat /etc/emr-record-server/ranger_policy_cache/emrSparkRangerPlugin_spark.json 
    sleep 3
done

After updated, go back to Hue, re-run previous spark codes, it will succeed as following:


By now, spark access control verifications are passed.

4. FAQ

4.1 How to Integrate an External KDC?

Keep everything as usual, but when creating emr cluster, do NOT select the generated security configuration by cli instead of creating another one manually, copy all values from the generated security configuration except “Authentication” part. For “Authentication”, select “External KDC” and fill your own values, and when entering Kerberos KDC host in command line console or exporting KERBEROS_KDC_HOST, also use your external KDC host name. Last, be sure if you still need migrate an external Kerberos database to OpenLDAP, if no, skip it with --skip-migrate-kerberos-db true.

4.2 Can I Rerun the All-In-One Installation Command Line?

Yes! and you need NOT take any cleanup actions.

5. Appendix

The following is parameter specification:

<

以上是关于Apache Ranger and AWS EMR Automated Installation Series : OpenLDAP + EMR-Native Ranger的主要内容,如果未能解决你的问题,请参考以下文章

Apache Ranger and AWS EMR Automated Installation Series : OpenLDAP + Open-Source Ranger

Apache Ranger and AWS EMR Automated Installation Series : OpenLDAP + Open-Source Ranger

Apache Ranger and AWS EMR Automated Installation Series : OpenLDAP + Open-Source Ranger

Apache Ranger and AWS EMR Automated Installation and Integration Series : Solutions Overview

Apache Ranger and AWS EMR Automated Installation Series : Windows AD + Open-Source Ranger

Apache Ranger and AWS EMR Automated Installation Series : Windows AD + Open-Source Ranger

(c)2006-2024 SYSTEM All Rights Reserved IT常识

ParameterComment
–regionthe aws region.
–access-key-idthe aws access key id of your IAM account.
–secret-access-keythe aws secret access key of your IAM account.
–ssh-keythe ssh private key file path.
–solutionthe solution name, accepted values ‘open-source’ or ‘emr-native’.
–auth-providerthe authentication provider, accepted values ‘ad’ or ‘openldap’.
–openldap-hostthe FQDN of openldap host.
–openldap-base-dnthe base dn of openldap, for example: ‘dc=example,dc=com’, change it according to your env.
–openldap-root-cnthe cn of root account, for example: ‘admin’, change it according to your env.
–openldap-root-passwordthe password of root account, for example: ‘Admin1234!’, change it according to your env.
–ranger-bind-dnthe bind dn for ranger, for example: ‘cn=ranger,ou=services,dc=example,dc=com’, this should be an existing dn on Windows AD / OpenLDAP, change it according to your env.
–ranger-bind-passwordthe password of ranger bind dn, for example: ‘Admin1234!’, change it according to your env.
–openldap-user-dn-patternthe dn pattern for ranger to search users on OpenLDAP, for example: ‘uid=0,ou=users,dc=example,dc=com’, change it according to your env.
–openldap-group-search-filterthe filter for ranger to search groups on OpenLDAP, for example: ‘(member=uid=0,ou=users,dc=example,dc=com)’, change it according to your env.
–openldap-user-object-classthe user object class for ranger to search users, for example: ‘inetOrgPerson’, change it according to your env.
–hue-bind-dnthe bind dn for hue, for example: ‘cn=hue,ou=services,dc=example,dc=com’, this should be an existing dn on Windows AD / OpenLDAP, change it according to your env.
–hue-bind-passwordthe password of hue bind dn, for example: ‘Admin1234!’, change it according to your env.
–example-usersthe example users to be created on OpenLDAP & Kerberos so as to demo ranger’s feature, this parameter is optional, if omitted, no example users will be created.
–ranger-bind-dnthe bind dn for ranger, for example: ‘cn=ranger,ou=services,dc=example,dc=com’, this should be an existing dn on Windows AD / OpenLDAP, change it according to your env.
–ranger-bind-passwordthe password of bind dn, for example: ‘Admin1234!’, change it according to your env.
–hue-bind-dnthe bind dn for hue, for example: ‘cn=hue,ou=services,dc=example,dc=com’, this should be an existing dn on Windows AD / OpenLDAP, change it according to your env.
–hue-bind-passwordthe password of hue bind dn, for example: ‘Admin1234!’, change it according to your env.
–sssd-bind-dn