HDFS命令实现分析

Posted hopkinscybn

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了HDFS命令实现分析相关的知识,希望对你有一定的参考价值。

HDFS命令概述

HDFS命令涉及两类,一类是hadoop命令,一类是hdfs命令,功能也分为两类,第一类是HDFS文件操作命令,第二类是HDFS管理命令。

二者都是shell命令,真正的命令只有hadoop和hdfs,而无所谓的ls/mv/cp/cat/mkdir…dfs/setQuota/fsck…等命令,后者都是以入参传递给hadoop和hdfs的。

具体实现参考bin/hadoop和bin/hdfs。hadoop族其他命令如yarn,实现机制类似。官方介绍如下:

The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS) as well as other file systems that 
Hadoop supports, such as Local FS, WebHDFS, S3 FS, and others. The FS shell is invoked by: bin
/hadoop fs <args> All FS shell commands take path URIs as arguments. The URI format is scheme://authority/path. For HDFS the scheme is hdfs, and for the Local FS the scheme is file.
The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs:
//namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost). Most of the commands in FS shell behave like corresponding Unix commands. Differences are described with each of the commands. Error information is sent to stderr and the output is sent to stdout. If HDFS is being used, hdfs dfs is a synonym. Relative paths can be used. For HDFS, the current working directory is the HDFS home directory /user/<username> that often has to be created manually. The HDFS home directory can also be implicitly accessed, e.g., when using the HDFS trash folder, the .Trash directory in the home directory.

 


 

HDFS命令实现机制:
输入命令hadoop/hdfs和参数,经过解析之后,通过执行java或者jsvc启动java程序,获取hdfs文件信息或者传入操作指令,实现文件系统管理。

针对hadoop fs和hdsf dfs/dfsadmin命令,hadoop-3.1.1版本具体实现如下:

相关的shell脚本为bin/hadoopbin/hdfs

相关的java类为 org/apache/hadoop/fs/FsShell.java org/apache/hadoop/fs/shell/CommandFactory.java org/apache/hadoop/util/ToolRunner.java  org/apache/hadoop/fs/shell/FsCommand.java  org/apache/hadoop/util/GenericOptionsParser.java 和 org/apache/hadoop/hdfs/tools/DFSAdmin.java 。

其中DFSAdmin.java继承自FsShell类。

FsShell: 是整个hdfs命令的核心,负责各种操作实现类的注册(通过反射实现)和命令的分发执行。
FsCommand: 各种操作命令注册实现。
CommandFactoryL: 命令注册机制的实现。主要通过工厂模式利用反射实现。
GenericOptionsParser.java: 中间转换操作,实际执行命令的仍是FsShell.run()函数。
GenericOptionsParser.java: 负责命令输入参数的解析。
ToolRunner.java: 执行命令操作,实际上是FsShell.run()执行。

1、命令注册

 //org/apache/hadoop/fs/FsShell.java
protected void init() throws IOException {
    getConf().setQuietMode(true);
    UserGroupInformation.setConfiguration(getConf());
    if (commandFactory == null) {
      commandFactory = new CommandFactory(getConf());
      commandFactory.addObject(new Help(), "-help");
      commandFactory.addObject(new Usage(), "-usage");
      registerCommands(commandFactory);
    }
  }

  protected void registerCommands(CommandFactory factory) {
    // TODO: DFSAdmin subclasses FsShell so need to protect the command
    // registration.  This class should morph into a base class for
    // commands, and then this method can be abstract
    if (this.getClass().equals(FsShell.class)) {
      factory.registerCommands(FsCommand.class);
    }
  }

//org/apache/hadoop/fs/shell/CommandFactory.java
/**
   * Invokes "static void registerCommands(CommandFactory)" on the given class.
   * This method abstracts the contract between the factory and the command
   * class.  Do not assume that directly invoking registerCommands on the
   * given class will have the same effect.
   * @param registrarClass class to allow an opportunity to register
   */
  public void registerCommands(Class<?> registrarClass) {
    try {
      registrarClass.getMethod(
          "registerCommands", CommandFactory.class
      ).invoke(null, this);
    } catch (Exception e) {
      throw new RuntimeException(StringUtils.stringifyException(e));
    }
  }

 //org/apache/hadoop/fs/shell/FsCommand.java
 /**
   * Register the command classes used by the fs subcommand
   * @param factory where to register the class
   */
  public static void registerCommands(CommandFactory factory) {
    factory.registerCommands(AclCommands.class);
    factory.registerCommands(CopyCommands.class);
    factory.registerCommands(Count.class);
    factory.registerCommands(Delete.class);
    factory.registerCommands(Display.class);
    factory.registerCommands(Find.class);
    factory.registerCommands(FsShellPermissions.class);
    factory.registerCommands(FsUsage.class);
    factory.registerCommands(Ls.class);
    factory.registerCommands(Mkdir.class);
    factory.registerCommands(MoveCommands.class);
    factory.registerCommands(SetReplication.class);
    factory.registerCommands(Stat.class);
    factory.registerCommands(Tail.class);
    factory.registerCommands(Head.class);
    factory.registerCommands(Test.class);
    factory.registerCommands(Touch.class);
    factory.registerCommands(Truncate.class);
    factory.registerCommands(SnapshotCommands.class);
    factory.registerCommands(XAttrCommands.class);
  }

 

 

2、命令解析

//TODO

3、命令执行 

//TODO

4、格式化输出

//TODO

 










以上是关于HDFS命令实现分析的主要内容,如果未能解决你的问题,请参考以下文章

HDFS块检查命令Fsck机理的分析

HDFS问题集,使用命令报错:com.google.protobuf.ServiceException:java.lang.OutOfMemoryError:java heap space(示例(代码

HDFS QJM机制分析

深入分析hadoop hdfs命令之touchz

HDFS Multiple Standby原理分析

HDFS oiv解析Fsimage OOM异常处理