HDFS部署体验

Posted 我有一腔沉默寡言

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了HDFS部署体验相关的知识,希望对你有一定的参考价值。

目录

  • 1. 简单说明

  • 2. 重要的配置参数及配置选择

  • 3. 部署实践,参数配置修改记录

    • 3.1. local machine, NameNode

    • 3.2. local machine, DataNode

    • 3.3. 192.168.1.101, DataNode

  • 4. 启动HDFS cluster

  • 5. 启动验证

    • 5.1. hdfs shell创建文件

    • 5.2. 问题与解决

  • 6. 关闭HDFS cluster

  • 7. 结论


1 简单说明

  • 下载hadoop distribution
    有三个包: 

    1. hadoop-x.y.z-site.tar.gz

    2. hadoop-x.y.z-src.tar.gz

    3. hadoop-x.y.z.tar.gz

  • hadoop由不同的组件组成,不同组件有不同的daemon,每个daemon是独立的java process;配置daemon的启动参数,是通过环境变量实现

    • MAP Reduce Job History Server daemon: MAPRED_HISTORYSERVER_OPTS

    • ResourceManager daemon: YARN_RESOURCEMANAGER_OPTS

    • NodeManager daemon: YARN_NODEMANAGER_OPTS

    • WebAppProxy daemon: YARN_PROXYSERVER_OPTS

    • NameNode daemon: HDFS_NAMENODE_OPTS

    • DataNode daemon: HDFS_DATANODE_OPTS

    • Secondary NameNode daemon: HDFS_SECONDARYNAMENODE_OPTS

    • HDFS
      在 etc/hadoop/hadoop-evn.sh 中配置

    • YARN
      在 etc/hadoop/yarn-evn.sh 中配置

    • MapReduce
      在 etc/hadoop/mapred-evn.sh 中配置

  • hadoop全局配置,在系统文件(~/.bashrc)中配置

    • HADOOP_HOME: hadoop distribution的家目录,至少要配置

    • HADOOP_PID_DIR

    • HADOOP_LOG_DIR

    • HADOOP_HEAPSIZE_MAX

2 重要的配置参数及配置选择

  • 所有节点都要配置

    • fs.defaultFS
      配置HDFS中NameNode的URI

    • io.file.buffer.size

    • etc/hadoop/core-site.xml
      示例配置位置:./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml

  • NameNode节点配置

    • dfs.namenode.name.dir

    • dfs.hosts / dfs.hosts/excluded

    • dfs.blocksize

    • dfs.namenode.handler.count

    • etc/hadoop/hdfs-site.xml
      示例配置位置:./share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

  • DataNode节点配置

    • dfs.datanode.data.dir

    • etc/hadoop/hdfs-site.xml

3 部署实践,参数配置修改记录

3.1 local machine, NameNode

  • system环境变量

    export HADOOP_HOME="/home/jng/installed/hadoop/hadoop-3.2.0"
    export HADOOP_PID_DIR="/home/jng/installed/hadoop/hadoop_pid_dir"
    export HADOOP_LOG_DIR="/home/jng/installed/hadoop/hadoop_log_dir"
  • etc/hadoop/core-size.xml

    • fs.defaultFS

      <property>
      <name>fs.defaultFS</name>
      <value>hdfs://195.90.3.212:9988/</value>
      <description>The name of the default file system. A URI whose
      scheme and authority determine the FileSystem implementation. The
      uri's scheme determines the config property (fs.SCHEME.impl) naming
      the FileSystem implementation class. The uri's authority is used to
      determine the host, port, etc. for a filesystem.</description>
      </property>
    • io.file.buffer.size

      <property>
      <name>io.file.buffer.size</name>
      <value>4096</value>
      <description>The size of buffer for use in sequence files.
      The size of this buffer should probably be a multiple of hardware
      page size (4096 on Intel x86), and it determines how much data is
      buffered during read and write operations.</description>
      </property>
  • etc/hadoop/hdfs-site.xml

    • dfs.namenode.name.dir

      <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:///home/jng/installed/hadoop/dfs_namenode_name_dir</value>
      <description>Determines where on the local filesystem the DFS name node
      should store the name table(fsimage). If this is a comma-delimited list
      of directories then the name table is replicated in all of the
      directories, for redundancy. </description>
      </property>

3.2 local machine, DataNode

  • etc/hadoop/hdfs-site.xml

    • dfs.datanode.data.dir

      <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:///home/jng/installed/hadoop/dfs_datanode_data_dir</value>
      <description>Determines where on the local filesystem an DFS data node
      should store its blocks. If this is a comma-delimited
      list of directories, then data will be stored in all named
      directories, typically on different devices. The directories should be tagged
      with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
      storage policies. The default storage type will be DISK if the directory does
      not have a storage type tagged explicitly. Directories that do not exist will
      be created if local filesystem permission allows.
      </description>
      </property>

3.3 192.168.1.101, DataNode

  • system环境变量

    export HADOOP_HOME="/home/mhb/installed/hadoop/hadoop-3.2.0"
    export HADOOP_PID_DIR="/home/mhb/installed/hadoop/hadoop_pid_dir"
    export HADOOP_LOG_DIR="/home/mhb/installed/hadoop/hadoop_log_dir"
  • etc/hadoop/core-site.html

    • fs.defaultFS

      <property>
      <name>fs.defaultFS</name>
      <value>hdfs://195.90.3.212:9988/</value>
      <description>The name of the default file system. A URI whose
      scheme and authority determine the FileSystem implementation. The
      uri's scheme determines the config property (fs.SCHEME.impl) naming
      the FileSystem implementation class. The uri's authority is used to
      determine the host, port, etc. for a filesystem.</description>
      </property>
    • io.file.buffer.size

      <property>
      <name>io.file.buffer.size</name>
      <value>4096</value>
      <description>The size of buffer for use in sequence files.
      The size of this buffer should probably be a multiple of hardware
      page size (4096 on Intel x86), and it determines how much data is
      buffered during read and write operations.</description>
      </property>
  • etc/hadoop/hdfs-site.xml

    • dfs.datanode.data.dir

      <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:///home/mhb/installed/hadoop/dfs_datanode_data_dir</value>
      <description>Determines where on the local filesystem an DFS data node
      should store its blocks. If this is a comma-delimited
      list of directories, then data will be stored in all named
      directories, typically on different devices. The directories should be tagged
      with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
      storage policies. The default storage type will be DISK if the directory does
      not have a storage type tagged explicitly. Directories that do not exist will
      be created if local filesystem permission allows.
      </description>
      </property>

4 启动HDFS cluster

  • 首次启动HDFS cluster必须进行格式化

    # 在namenode设备上执行(??)
    $ $HADOOP_HOME/bin/hdfs namenode -format <cluster_name>
  • 启动NameNode

    # 在namenode设备上执行
    $ $HADOOP_HOME/bin/hdfs --daemon start namenode
  • 启动DataNode

    $ $HADOOP_HOME/bin/hdfs --daemon start datanode
  • 可选一键启动

    # 前提是同时满足: 1)etc/hadoop/workers文件被正确配置;2)NameNode设备与DataNode设备间无密码ssh访问已经配置完毕
    $ $HADOOP_HOME/sbin/start-dfs.sh

5 启动验证

  • 查看NameNode的web ui: http://ip:port default port is: 9870

  • 查看DataNode的web ui: http://ip:port default port is: 9864

5.1 hdfs shell创建文件

从本地copy大尺寸文件到hdfs中,查看namenode、datanode的数据存储文件夹大小变化

  • copy文件前

    • local machine as NameNode的dfs.namenode.name.dir路径

      [j@j dfs_namenode_name_dir]$ pwd
      /home/jng/installed/hadoop/dfs_namenode_name_dir
      [j@j dfs_namenode_name_dir]$ du -hs
      2.1M .
      [j@j dfs_namenode_name_dir]$
    • local machine as DataNode的dfs.datanode.data.dir路径

      [j@j dfs_datanode_data_dir]$ pwd
      /home/jng/installed/hadoop/dfs_datanode_data_dir
      [j@j dfs_datanode_data_dir]$ du -hs
      44K .
      [j@j dfs_datanode_data_dir]$
    • 192.168.1.101 DataNode的dfs.datanode.data.dir路径

      m@m:~/installed/hadoop/dfs_datanode_data_dir$ pwd
      /home/mhb/installed/hadoop/dfs_datanode_data_dir
      m@m:~/installed/hadoop/dfs_datanode_data_dir$ du -hs
      44K .
      m@m:~/installed/hadoop/dfs_datanode_data_dir$
  • copy文件

    # 在NameNode上操作
    [j@j hadoop-3.2.0]$ pwd
    /home/jng/installed/hadoop/hadoop-3.2.0
    [j@j hadoop-3.2.0]$ ls -lh ~/software/hadoop/hadoop-3.2.0.tar.gz
    -rw-r--r-- 1 jng jng 330M 2月 25 14:21 /home/jng/software/hadoop/hadoop-3.2.0.tar.gz
    [j@j hadoop-3.2.0]$ ./bin/hdfs dfs -moveFromLocal ~/software/hadoop/hadoop-3.2.0.tar.gz /tmp/
  • copy文件后

    • local machine as NameNode的dfs.namenode.name.dir路径

      [j@j dfs_namenode_name_dir]$ pwd
      /home/jng/installed/hadoop/dfs_namenode_name_dir
      [j@j dfs_namenode_name_dir]$ du -hs
      2.1M .
      [j@j dfs_namenode_name_dir]$
    • local machine as DataNode的dfs.datanode.data.dir路径

      [j@j dfs_datanode_data_dir]$ pwd
      /home/jng/installed/hadoop/dfs_datanode_data_dir
      [j@j dfs_datanode_data_dir]$ du -hs
      333M .
      [j@j dfs_datanode_data_dir]$
    • 192.168.1.101 as DataNode的dfs.dataanode.data.dir路径

      m@m:~/installed/hadoop/dfs_datanode_data_dir$ pwd
      /home/mhb/installed/hadoop/dfs_datanode_data_dir
      m@m:~/installed/hadoop/dfs_datanode_data_dir$ du -hs
      333M .
      m@m:~/installed/hadoop/dfs_datanode_data_dir$

5.2 问题与解决

  • NameNode web UI 上查看 namenode-log 可能发现 WARN 形如:“WARN org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Unresolved datanode registration: hostname cannot be resolved (ip=192.168.1.101, hostname=192.168.1.101)“
    ref: https://blog.csdn.net/qqpy789/article/details/78189335
    修改文件 etc/hadoop/hdfs-site.xml 配置 dfs.namenode.datanode.registration.ip-hostname-check 取值为false

    <property>
    <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
    <value>false</value>
    <description>
    If true (the default), then the namenode requires that a connecting
    datanode's address must be resolved to a hostname. If necessary, a reverse
    DNS lookup is performed. All attempts to register a datanode from an
    unresolvable address are rejected.

    It is recommended that this setting be left on to prevent accidental
    registration of datanodes listed by hostname in the excludes file during a
    DNS outage. Only set this to false in environments where there is no
    infrastructure to support reverse DNS lookup.
    </description>
    </property>

6 关闭HDFS cluster

  • 关闭NameNode

    # 在NameNode设备上执行
    $ $HADOOP_HOME/bin/hdfs --daemon stop namenode
  • 关闭DataNode

    $ $HADOOP_HOME/bin/hdfs --daemon stop datanode

7 结论

  1. HDFS可以独立与YARN存在并运行
    即,不启动YARN,HDFS也能正常运行,至少通过HDFS shell是这样

  2. HDFS的NameNode设备上可以同时运行一个DataNode

Author: 祺嘉爸

Created: 2019-03-02 08:03:30

Edited using: Emacs

Exported using: customized-html-backend


以上是关于HDFS部署体验的主要内容,如果未能解决你的问题,请参考以下文章

SpringBoot 部署 Jar 文件,瘦身优化指南 !

HDFS部署最佳实践

python hdfs初体验

06部署Spark程序到集群上运行

Spring Boot部署JAR文件瘦身优化经验分享

Tailwind.css 体验总结