安全HDFS客户端初始化方式

Posted 厚积_薄发

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了安全HDFS客户端初始化方式相关的知识,希望对你有一定的参考价值。

转自:https://community.hortonworks.com/articles/56702/a-secure-hdfs-client-example.html


Short Description:

Explaining the creation of a secure HDFS client in Java

Article

It takes about 3 lines of Java code to write a simple HDFS client that can further be used to upload, read or list files. Here is an example:


 
  1. Configuration conf = new Configuration();
  2. conf.set("fs.defaultFS","hdfs://one.hdp:8020");
  3. FileSystem fs = FileSystem.get(conf);

This file system API gives the developer a generic interface to (any supported) file system depending on the protocol being use, in this case hdfs. This is enough to alter data on the Hadoop Distributed Filesystem, for example to list all the files under the root folder:


 
  1. FileStatus[] fsStatus = fs.listStatus(new Path("/"));
  2. for(int i = 0; i < fsStatus.length; i++)
  3. System.out.println(fsStatus[i].getPath().toString());

For a secured environment this is not enough, because you would need to consider these further aspects:

  1. A secure protocol
  2. Authentication with Kerberos
  3. Impersonation (proxy user), if designed as a service

What we discuss here for a sample HDFS client can in variance also be applied to other Hadoop clients.

A Secure HDFS Protocol

One way to secure the communication between clients and Hadoop services in general is to use SSL encryption for all RPC calls. This does have a sever impact on the overall cluster performance in general. To avoid this and still ensure a secure communication it can be enough to just encrypt HTTP endpoints. If doing so swebhdfs (SSL+webhdfs) can be used as the protocol. Example:


 
  1. Configuration conf = new Configuration();
  2. conf.set("fs.defaultFS","swebhdfs://one.hdp:50470");
  3. FileSystem fs = FileSystem.get(conf);

Authentication with Kerberos

A secure client would need to use Kerberos, which is the only authentication method currently supported by Hadoop. Kerberos does require very thoughtful configuration but rewards it's users with an almost completely transparent authentication implementation that simply works.

Making use of Kerberos authentication in Java is provided by the Java Authentication and Authorization Service (JAAS)which is a pluggable authentication method similar to PAM supporting multiple authentication methods. In this case the authentication method being used is GSS-API for Kerberos.

For JAAS a proper configuration of GSS would be needed in addition to being in possession of proper credentials, obviously. Some credentials can be created with MIT Kerberos like this:


 
  1. (as root)
  2. $ kadmin.local -q "addprinc -pw hadoop hdfs-user"
  3. $ kadmin.local -q "xst -k /home/hdfs-user/hdfs-user.keytab hdfs-user@MYCORP.NET"
  4. (Creating a keytab will make the existing password invalid. To change your password back to hadoop use as root:)
  5. $ kadmin.local -q "cpw -pw hadoop hdfs-user"

The last line is not necessarily needed as it creates us a so called keytab - basically an encrypted password of the user - that can be used for password less authentication for example for automated services. We will make use of that here as well.

Additionally we create a JAAS configuration, we can use for authentication:


 
  1. com.sun.security.jgss.krb5.initiate
  2. com.sun.security.auth.module.Krb5LoginModule required
  3. doNotPrompt=true
  4. principal="hdfs-user@MYCORP.NET"
  5. useKeyTab=true
  6. keyTab="/home/hdfs-user/hdfs-user.keytab"
  7. storeKey=true;
  8. ;

We now have multiple ways to use authentication and here I will start with probably the most simple approach regarding required code changes:

1. Authentication with Keytab

Authentication web based access to HDFS with a keytab requires almost no code changes despite the use of (s)webhdfs protocol and change of authentication method:


 
  1. conf.set("fs.defaultFS", "webhdfs://one.hdp:50070");
  2. conf.set("hadoop.security.authentication", "kerberos");
  3.  
  4. FileSystem fs = FileSystem.get(conf);
  5. FileStatus[] fsStatus = fs.listStatus(new Path("/"));
  6. for(int i = 0; i < fsStatus.length; i++)
  7. System.out.println(fsStatus[i].getPath().toString());

The above is enough if executed in a JAAS context. Creating the secure context can be done be using the above JAAS and keytab.


 
  1. java -Djava.security.auth.login.config=/home/hdfs-user/jaas.conf \\
  2. -Djava.security.krb5.conf=/etc/krb5.conf \\
  3. -Djavax.security.auth.useSubjectCredsOnly=false \\
  4. -cp "./hdfs-sample-1.0-SNAPSHOT.jar:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-client/*" \\
  5. hdfs.sample.HdfsMain
  6.  
  7. webhdfs://one.hdp:50070/app-logs
  8. webhdfs://one.hdp:50070/apps
  9. webhdfs://one.hdp:50070/ats
  10. webhdfs://one.hdp:50070/hdp
  11. webhdfs://one.hdp:50070/mapred
  12. webhdfs://one.hdp:50070/mr-history
  13. webhdfs://one.hdp:50070/tmp
  14. webhdfs://one.hdp:50070/user
2. Using UserGroupInformation

For authentication in Hadoop there exists a wrapper class around a JAAS Subject to provide methods for user login. The UserGroupInformation wrapper without a specific setup uses the system security context, in case of Kerberos this exist in the ticket cache (klist shows the existing security context of a user). This is demonstrated under "With Existing Security Context"below. Further a custom security context can be used for login, either with by using a keytab file or even with credentials. Both approaches are also demonstrated here under "Providing Credentials from Login" and "Via Keytab".

With Existing Security Context

First we would need to authenticate and make sure we have a proper security context:


 
  1. $ kinit
  2. Password for hdfs-user@MYCORP.NET:
  3. $ klist
  4. Ticket cache: FILE:/tmp/krb5cc_1013
  5. Default principal: hdfs-user@MYCORP.NET
  6.  
  7. Valid starting Expires Service principal
  8. 02/14/2016 14:54:32 02/15/2016 14:54:32 krbtgt/MYCORP.NET@MYCORP.NET

With this the following can HDFS client implementation can be used in a secured environment:


 
  1. Configuration conf = new Configuration();
  2. conf.set("fs.defaultFS", "hdfs://one.hdp:8020");
  3. conf.set("hadoop.security.authentication", "kerberos");
  4.  
  5. UserGroupInformation.setConfiguration(conf);
  6. // Subject is taken from current user context
  7. UserGroupInformation.loginUserFromSubject(null);
  8.  
  9. FileSystem fs = FileSystem.get(conf);
  10. FileStatus[] fsStatus = fs.listStatus(new Path("/"));
  11.  
  12. for(int i = 0; i <= fsStatus.length; i++)
  13. System.out.println(fsStatus[i].getPath().toString());

Creating the JAAS context during run-time the client could be executed like this:


 
  1. java -cp "./hdfs-sample-1.0-SNAPSHOT.jar:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-client/*" \\
  2. hdfs.sample.HdfsMain
  3.  
  4. hdfs://one.hdp:8020/app-logs
  5. hdfs://one.hdp:8020/apps
  6. hdfs://one.hdp:8020/ats
  7. hdfs://one.hdp:8020/hdp
  8. hdfs://one.hdp:8020/mapred
  9. hdfs://one.hdp:8020/mr-history
  10. hdfs://one.hdp:8020/tmp
  11. hdfs://one.hdp:8020/user
Providing Credentials from Login

Providing login credentials at execution requires the creation of a javax.security.auth.Subject with username and password. This means that we will have to use the GSS-API to do a kinit like this:


 
  1. private static String username = "hdfs-user";
  2. private static char[] password = "hadoop".toCharArray();
  3. public static LoginContext kinit() throws LoginException
  4. LoginContext lc = new LoginContext(HdfsMain.class.getSimpleName(), new CallbackHandler()
  5. public void handle(Callback[] callbacks) throws IOException, UnsupportedCallbackException
  6. for(Callback c : callbacks)
  7. if(c instanceof NameCallback)
  8. ((NameCallback) c).setName(username);
  9. if(c instanceof PasswordCallback)
  10. ((PasswordCallback) c).setPassword(password);
  11. );
  12. lc.login();
  13. return lc;

We still have to configure the JAAS login module referenced by the name that we provide in the above implementation. The name applied in the example above is set to be HdfsMain.class.getSimpleName(), so our module configuration should look like this:


 
  1. HdfsMain
  2. com.sun.security.auth.module.Krb5LoginModule required client=TRUE;
  3. ;

Having this in place we can now login with username and password:


 
  1. Configuration conf = new Configuration();
  2. conf.set("fs.defaultFS", "hdfs://one.hdp:8020");
  3. conf.set("hadoop.security.authentication", "kerberos");
  4. UserGroupInformation.setConfiguration(conf);
  5.  
  6. LoginContext lc = kinit();
  7. UserGroupInformation.loginUserFromSubject(lc.getSubject());
  8.  
  9. FileSystem fs = FileSystem.get(conf);
  10. FileStatus[] fsStatus = fs.listStatus(new 以上是关于安全HDFS客户端初始化方式的主要内容,如果未能解决你的问题,请参考以下文章

    Hadoop(11)——HDFS如何保证数据安全

    HDFS解析 | HDFS短路读详解

    hadoop之hdfs安全模式

    HDFS短路读详解

    HDFS读写流程简介

    HDFS的可靠性设计