HDFS部署体验
Posted 我有一腔沉默寡言
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了HDFS部署体验相关的知识,希望对你有一定的参考价值。
目录
1. 简单说明
2. 重要的配置参数及配置选择
3. 部署实践,参数配置修改记录
3.1. local machine, NameNode
3.2. local machine, DataNode
3.3. 192.168.1.101, DataNode
4. 启动HDFS cluster
5. 启动验证
5.1. hdfs shell创建文件
5.2. 问题与解决
6. 关闭HDFS cluster
7. 结论
1 简单说明
下载hadoop distribution
有三个包:hadoop-x.y.z-site.tar.gz
hadoop-x.y.z-src.tar.gz
hadoop-x.y.z.tar.gz
hadoop由不同的组件组成,不同组件有不同的daemon,每个daemon是独立的java process;配置daemon的启动参数,是通过环境变量实现
MAP Reduce Job History Server daemon: MAPRED_HISTORYSERVER_OPTS
ResourceManager daemon: YARN_RESOURCEMANAGER_OPTS
NodeManager daemon: YARN_NODEMANAGER_OPTS
WebAppProxy daemon: YARN_PROXYSERVER_OPTS
NameNode daemon: HDFS_NAMENODE_OPTS
DataNode daemon: HDFS_DATANODE_OPTS
Secondary NameNode daemon: HDFS_SECONDARYNAMENODE_OPTS
HDFS
在 etc/hadoop/hadoop-evn.sh 中配置YARN
在 etc/hadoop/yarn-evn.sh 中配置MapReduce
在 etc/hadoop/mapred-evn.sh 中配置hadoop全局配置,在系统文件(~/.bashrc)中配置
HADOOP_HOME: hadoop distribution的家目录,至少要配置
HADOOP_PID_DIR
HADOOP_LOG_DIR
HADOOP_HEAPSIZE_MAX
2 重要的配置参数及配置选择
所有节点都要配置
fs.defaultFS
配置HDFS中NameNode的URIio.file.buffer.size
etc/hadoop/core-site.xml
示例配置位置:./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xmlNameNode节点配置
dfs.namenode.name.dir
dfs.hosts / dfs.hosts/excluded
dfs.blocksize
dfs.namenode.handler.count
etc/hadoop/hdfs-site.xml
示例配置位置:./share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xmlDataNode节点配置
dfs.datanode.data.dir
etc/hadoop/hdfs-site.xml
3 部署实践,参数配置修改记录
3.1 local machine, NameNode
system环境变量
export HADOOP_HOME="/home/jng/installed/hadoop/hadoop-3.2.0"
export HADOOP_PID_DIR="/home/jng/installed/hadoop/hadoop_pid_dir"
export HADOOP_LOG_DIR="/home/jng/installed/hadoop/hadoop_log_dir"etc/hadoop/core-size.xml
fs.defaultFS
<property>
<name>fs.defaultFS</name>
<value>hdfs://195.90.3.212:9988/</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>io.file.buffer.size
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
<description>The size of buffer for use in sequence files.
The size of this buffer should probably be a multiple of hardware
page size (4096 on Intel x86), and it determines how much data is
buffered during read and write operations.</description>
</property>etc/hadoop/hdfs-site.xml
dfs.namenode.name.dir
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/jng/installed/hadoop/dfs_namenode_name_dir</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table(fsimage). If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. </description>
</property>
3.2 local machine, DataNode
etc/hadoop/hdfs-site.xml
dfs.datanode.data.dir
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/jng/installed/hadoop/dfs_datanode_data_dir</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices. The directories should be tagged
with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
storage policies. The default storage type will be DISK if the directory does
not have a storage type tagged explicitly. Directories that do not exist will
be created if local filesystem permission allows.
</description>
</property>
3.3 192.168.1.101, DataNode
system环境变量
export HADOOP_HOME="/home/mhb/installed/hadoop/hadoop-3.2.0"
export HADOOP_PID_DIR="/home/mhb/installed/hadoop/hadoop_pid_dir"
export HADOOP_LOG_DIR="/home/mhb/installed/hadoop/hadoop_log_dir"etc/hadoop/core-site.html
fs.defaultFS
<property>
<name>fs.defaultFS</name>
<value>hdfs://195.90.3.212:9988/</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>io.file.buffer.size
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
<description>The size of buffer for use in sequence files.
The size of this buffer should probably be a multiple of hardware
page size (4096 on Intel x86), and it determines how much data is
buffered during read and write operations.</description>
</property>etc/hadoop/hdfs-site.xml
dfs.datanode.data.dir
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/mhb/installed/hadoop/dfs_datanode_data_dir</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices. The directories should be tagged
with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
storage policies. The default storage type will be DISK if the directory does
not have a storage type tagged explicitly. Directories that do not exist will
be created if local filesystem permission allows.
</description>
</property>
4 启动HDFS cluster
首次启动HDFS cluster必须进行格式化
# 在namenode设备上执行(??)
$ $HADOOP_HOME/bin/hdfs namenode -format <cluster_name>启动NameNode
# 在namenode设备上执行
$ $HADOOP_HOME/bin/hdfs --daemon start namenode启动DataNode
$ $HADOOP_HOME/bin/hdfs --daemon start datanode
可选一键启动
# 前提是同时满足: 1)etc/hadoop/workers文件被正确配置;2)NameNode设备与DataNode设备间无密码ssh访问已经配置完毕
$ $HADOOP_HOME/sbin/start-dfs.sh
5 启动验证
查看NameNode的web ui: http://ip:port default port is: 9870
查看DataNode的web ui: http://ip:port default port is: 9864
5.1 hdfs shell创建文件
从本地copy大尺寸文件到hdfs中,查看namenode、datanode的数据存储文件夹大小变化
copy文件前
local machine as NameNode的dfs.namenode.name.dir路径
[j@j dfs_namenode_name_dir]$ pwd
/home/jng/installed/hadoop/dfs_namenode_name_dir
[j@j dfs_namenode_name_dir]$ du -hs
2.1M .
[j@j dfs_namenode_name_dir]$local machine as DataNode的dfs.datanode.data.dir路径
[j@j dfs_datanode_data_dir]$ pwd
/home/jng/installed/hadoop/dfs_datanode_data_dir
[j@j dfs_datanode_data_dir]$ du -hs
44K .
[j@j dfs_datanode_data_dir]$192.168.1.101 DataNode的dfs.datanode.data.dir路径
m@m:~/installed/hadoop/dfs_datanode_data_dir$ pwd
/home/mhb/installed/hadoop/dfs_datanode_data_dir
m@m:~/installed/hadoop/dfs_datanode_data_dir$ du -hs
44K .
m@m:~/installed/hadoop/dfs_datanode_data_dir$copy文件
# 在NameNode上操作
[j@j hadoop-3.2.0]$ pwd
/home/jng/installed/hadoop/hadoop-3.2.0
[j@j hadoop-3.2.0]$ ls -lh ~/software/hadoop/hadoop-3.2.0.tar.gz
-rw-r--r-- 1 jng jng 330M 2月 25 14:21 /home/jng/software/hadoop/hadoop-3.2.0.tar.gz
[j@j hadoop-3.2.0]$ ./bin/hdfs dfs -moveFromLocal ~/software/hadoop/hadoop-3.2.0.tar.gz /tmp/copy文件后
local machine as NameNode的dfs.namenode.name.dir路径
[j@j dfs_namenode_name_dir]$ pwd
/home/jng/installed/hadoop/dfs_namenode_name_dir
[j@j dfs_namenode_name_dir]$ du -hs
2.1M .
[j@j dfs_namenode_name_dir]$local machine as DataNode的dfs.datanode.data.dir路径
[j@j dfs_datanode_data_dir]$ pwd
/home/jng/installed/hadoop/dfs_datanode_data_dir
[j@j dfs_datanode_data_dir]$ du -hs
333M .
[j@j dfs_datanode_data_dir]$192.168.1.101 as DataNode的dfs.dataanode.data.dir路径
m@m:~/installed/hadoop/dfs_datanode_data_dir$ pwd
/home/mhb/installed/hadoop/dfs_datanode_data_dir
m@m:~/installed/hadoop/dfs_datanode_data_dir$ du -hs
333M .
m@m:~/installed/hadoop/dfs_datanode_data_dir$
5.2 问题与解决
NameNode web UI 上查看 namenode-log 可能发现 WARN 形如:“WARN org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Unresolved datanode registration: hostname cannot be resolved (ip=192.168.1.101, hostname=192.168.1.101)“
ref: https://blog.csdn.net/qqpy789/article/details/78189335
修改文件 etc/hadoop/hdfs-site.xml 配置 dfs.namenode.datanode.registration.ip-hostname-check 取值为false<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
<description>
If true (the default), then the namenode requires that a connecting
datanode's address must be resolved to a hostname. If necessary, a reverse
DNS lookup is performed. All attempts to register a datanode from an
unresolvable address are rejected.
It is recommended that this setting be left on to prevent accidental
registration of datanodes listed by hostname in the excludes file during a
DNS outage. Only set this to false in environments where there is no
infrastructure to support reverse DNS lookup.
</description>
</property>
6 关闭HDFS cluster
关闭NameNode
# 在NameNode设备上执行
$ $HADOOP_HOME/bin/hdfs --daemon stop namenode关闭DataNode
$ $HADOOP_HOME/bin/hdfs --daemon stop datanode
7 结论
HDFS可以独立与YARN存在并运行
即,不启动YARN,HDFS也能正常运行,至少通过HDFS shell是这样HDFS的NameNode设备上可以同时运行一个DataNode
Author: 祺嘉爸
Created: 2019-03-02 08:03:30
Edited using: Emacs
Exported using: customized-html-backend
以上是关于HDFS部署体验的主要内容,如果未能解决你的问题,请参考以下文章