Hadoop.2.x_伪分布环境搭建
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hadoop.2.x_伪分布环境搭建相关的知识,希望对你有一定的参考价值。
1. 设置主机名、静态IP/DNS、主机映射、windows主机映射(方便ssh访问与IP修改)等
设置主机名: vi /etc/sysconfig/network # 重启系统生效(临时修改: hastname xxx;另起一个终端将会看到效果,需要注意的是: 若即将搭建Hadoop,这里起的hostname禁止使用"_") 设置静态IP/DNS: vi /etc/sysconfig/network-scripts/ifcfg-eth0(示例:修改BOOTPROTO=static;IPADDR=192.168.0.111;GATEWAY=192.168.0.1;DNS1=192.168.0.1,重启网络服务: service network restart) 设置主机映射: vi /etc/hosts (格式:IP 主机名) 设置window主机映射: 修改host文件,添加 [IP 主机名] 关闭防火墙:chkconfig iptables off/service iptables restart(临时修改: service iptables stop/start 立即生效) 关闭selinx:vi /etc/sysconfig/selinux # 需要重启系统生效(linux的一个加强安全子系统,加强对文件的访问控制,临时关闭(放开):setenforce 0;临时开启:setenforce 1) 查看linux中是否有自带的open jdk,有则卸载,以免后期和后面安装jdk冲突而不生效(查看是否存在: java -version,如果已存在则查看java版本: rpm -qa | grep "java",卸载 rpm -e "查出来的java版本" 或 yum -y remove "查出来的java版本") 准备压缩包: hadoop-2.5.0.tar.gz hadoop-2.5.0-src.tar.gz(可选包,编译源码包时使用) native-2.5.0.tar.gz(可选包,已编译好的hadoop库,可直接替换使用) protobuf-2.5.0.tar.gz(可选包,编译源码是必备组件) jdk-7u67-linux-x64.tar.gz(hadoop2.x要求jdk版本1.7+) apache-maven-3.0.5-bin.tar.gz(Maven包) repository.tar.gz(可选包,Maven仓库,在编译Hadoop源码,会用到,若不用,则在编译时会花费更长时间去下载) eclipse-jee-kepler-SR1-linux-gtk-x86_64.tar.gz(linux下使用,编写mr程序本地测试使用)
2. 添加好用户,建立文件夹,并将准备文件上传至files
[[email protected] ~]# su - liuwl [[email protected] ~]$ cd opt/ [[email protected] opt]$ ls data files localsrc modules software workspace --------------------------------------------------------------- 上传搭建Hadoop2.x的所有tar压缩包,压缩包自备,使用上传工具 上传工具很多:filezilla,FlashFXP,Xftp,vmware-tools,notepad++... 可能会有文件夹权限问题,需要检查一下
3. 创建用户分配权限liuwl,并使用visudo给liuwl
[[email protected] ~]# visudo ... liuwl ALL=(root) NOPASSWD:ALL [[email protected] ~]# su - liuwl [[email protected] ~]$ sudo -l ... User liuwl may run the following commands on this host: (root) NOPASSWD: ALL
4. 建立文件目录
[[email protected]66-bigdata-hadoop ~]# su - liuwl [[email protected] ~]$ cd opt/ [[email protected] opt]$ ls data files localsrc modules software workspace # 文件夹随意,自己知道是装载什么的就好
5. 安装 jdk-7u67-linux-x64(注意jdk版本号和是合适的系统位数,我这里是CentOS_66_64)
[[email protected] ~]$ vi /etc/profile ... #JAVA_HOME export JAVA_HOME=/opt/modules/jdk1.7.0_67 export PATH=$PATH:$JAVA_HOME/bin [[email protected] ~] source /etc/profile [[email protected] ~]$ echo $JAVA_HOME /opt/modules/jdk1.7.0_67 [[email protected] ~]$ java -version java version "1.7.0_67" Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
6. 解压hadoop-2.5.0.tar.gz并删除doc文档(doc文件太大,且不常使用可拷出来日常查看)
# 有兴趣的朋友可以使用lynx在终端查看doc文档,当然需要使用root用户安装lynx:yum -y instatll lynx
# 然后lynx xxx.html 退出:q-->y
[[email protected] ~]$ cd /home/liuwl/opt/files [[email protected] files]$ tar -zxf hadoop-2.5.0.tar.gz -C ../modules/ [[email protected] files]$ sudo rm -rf ../modules/hadoop-2.5.0/share/doc/
二、 Hadoop伪分布模式搭建(正题)
★ 配置文件目录:/home/liuwl/opt/modules/hadoop-2.5.0/etc/hadoop
PS:使用notepad++(NppFTP,若没有自行下载该组件)
1. 为xxx.env.sh配置jdk,即JAVA_HOME
hadoop-env.sh export JAVA_HOME=/opt/modules/jdk1.7.0_67 mapred-env.sh export JAVA_HOME=/opt/modules/jdk1.7.0_67 yarn-env.sh export JAVA_HOME=/opt/modules/jdk1.7.0_67
2. 配置Hadoop自定义文件
1> hdfs >>
? namenode >>
core-site.xml >> <!--指定namenode主机地址--> <property> <name>fs.defaultFS</name> <value>hdfs://centos66-bigdata-hadoop.com:8020</value> </property> <!--指定hdfs格式化临时目录--> <property> <name>hadoop.tmp.dir</name> <value>/home/liuwl/opt/modules/hadoop-2.5.0/data/tmp</value> </property> <!--修改外部web访问的账户,更改dr.who为liuwl(自定义)--> <property> <name>hadoop.http.staticuser.user</name> <value>liuwl</value> </property>
? datanode >>
slaves >> linux_66_64.liuwl hdfs-site.xml >> <!--设置系统快副本个数--> <property> <name>dfs.replication</name> <value>1</value> </property> <!--访问jar运行后的临时目录去除权限限制--> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property>
2> 格式化hdfs >>
[[email protected] hadoop-2.5.0]$ bin/hdfs namenode -format [[email protected] hadoop-2.5.0]$ ls data/tmp/ dfs
3> 配置Yarn环境(包括SecondaryNameNode,JobHistoryServer) >>
yarn-site.xml >> <!--告知系统resourcemanager所在机器--> <property> <name>yarn.resourcemanager.hostname</name> <value>centos66-bigdata-hadoop.com</value> </property> <!--告知系统在nodemanager上运行MR程序--> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!--启用日志聚集功能--> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!--配置日志保存期限,单位为秒--> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>108600</value> </property>
4> 配置mapreduce环境
mapred.site.xml >> <!--指定MapReduce运行在YARN上--> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!--配置historyserver指定机器--> <property> <name>mapreduce.jobhistory.address</name> <value>centos66-bigdata-hadoop.com:10020</value> </property> <!--配置web访问historyserver--> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>centos66-bigdata-hadoop.com:19888</value> </property>
5> 分别启动
[[email protected] hadoop-2.5.0]$ sbin/hadoop-daemon.sh start namenode starting namenode, logging to /home/liuwl/opt/modules/hadoop-2.5.0/logs/hadoop-liuwl-namenode-centos66-bigdata-hadoop.com.out [[email protected] hadoop-2.5.0]$ sbin/hadoop-daemon.sh start datanode starting datanode, logging to /home/liuwl/opt/modules/hadoop-2.5.0/logs/hadoop-liuwl-datanode-centos66-bigdata-hadoop.com.out [[email protected] hadoop-2.5.0]$ sbin/yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /home/liuwl/opt/modules/hadoop-2.5.0/logs/yarn-liuwl-resourcemanager-centos66-bigdata-hadoop.com.out [[email protected] hadoop-2.5.0]$ sbin/yarn-daemon.sh start nodemanager starting nodemanager, logging to /home/liuwl/opt/modules/hadoop-2.5.0/logs/yarn-liuwl-nodemanager-centos66-bigdata-hadoop.com.out [[email protected] hadoop-2.5.0]$ sbin/mr-jobhistory-daemon.sh start historyserver starting historyserver, logging to /home/liuwl/opt/modules/hadoop-2.5.0/logs/mapred-liuwl-historyserver-centos66-bigdata-hadoop.com.out [[email protected] hadoop-2.5.0]$ sbin/hadoop-daemon.sh start secondarynamenode starting secondarynamenode, logging to /home/liuwl/opt/modules/hadoop-2.5.0/logs/hadoop-liuwl-secondarynamenode-centos66-bigdata-hadoop.com.out [[email protected] hadoop-2.5.0]$ jps 10772 NameNode 11179 NodeManager 10853 DataNode 10938 ResourceManager 11382 SecondaryNameNode 11302 JobHistoryServer 11420 Jps
3. 测试hdfs文件系统
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -mkdir -p /user/liuwl/tmp 16/09/14 07:51:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [[email protected] hadoop-2.5.0]$ vi ../../data/wordcount.input [[email protected] hadoop-2.5.0]$ bin/hdfs dfs -mkdir -p /user/liuwl/tmp/input 16/09/14 07:54:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [li[email protected] hadoop-2.5.0]$ bin/hdfs dfs -put ../../data/wordcount.input /user/liuwl/tmp/input 16/09/14 07:54:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [[email protected] hadoop-2.5.0]$ bin/hdfs dfs -cat /user/liuwl/tmp/input/wordcount.input 16/09/14 07:55:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable hadoop mapreduce yarn historyserver hadoop mapreduce yarn namenode datanode datanode [[email protected] hadoop-2.5.0]$ bin/hdfs dfs -get /user/liuwl/tmp/input/wordcount.input /opt/modules/wc.input 16/09/14 07:56:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable get: /opt/modules/wc.input._COPYING_ (Permission denied) [[email protected] hadoop-2.5.0]$ bin/hdfs dfs -get /user/liuwl/tmp/input/wordcount.input ~/opt/data/wc.input 16/09/14 07:57:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [[email protected] hadoop-2.5.0]$ cat ../../data/wc.input hadoop mapreduce yarn historyserver hadoop mapreduce yarn namenode datanode datanode
4. 使用mapreduce运行jar文件
[[email protected] hadoop-2.5.0]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /user/liuwl/tmp/input /user/liuwl/tmp/output 16/09/14 07:59:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/09/14 07:59:55 INFO client.RMProxy: Connecting to ResourceManager at centos66-bigdata-hadoop.com/192.168.0.110:8032 16/09/14 07:59:57 INFO input.FileInputFormat: Total input paths to process : 1 16/09/14 07:59:57 INFO mapreduce.JobSubmitter: number of splits:1 16/09/14 07:59:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1473864360962_0001 16/09/14 07:59:59 INFO impl.YarnClientImpl: Submitted application application_1473864360962_0001 16/09/14 08:00:00 INFO mapreduce.Job: The url to track the job: http://centos66-bigdata-hadoop.com:8088/proxy/application_1473864360962_0001/ 16/09/14 08:00:00 INFO mapreduce.Job: Running job: job_1473864360962_0001 16/09/14 08:00:30 INFO mapreduce.Job: Job job_1473864360962_0001 running in uber mode : false 16/09/14 08:00:30 INFO mapreduce.Job: map 0% reduce 0% 16/09/14 08:01:19 INFO mapreduce.Job: map 100% reduce 0% 16/09/14 08:01:47 INFO mapreduce.Job: map 100% reduce 100% 16/09/14 08:01:49 INFO mapreduce.Job: Job job_1473864360962_0001 completed successfully 16/09/14 08:01:54 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=96 FILE: Number of bytes written=194473 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=226 HDFS: Number of bytes written=66 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=48483 Total time spent by all reduces in occupied slots (ms)=21661 Total time spent by all map tasks (ms)=48483 Total time spent by all reduce tasks (ms)=21661 Total vcore-seconds taken by all map tasks=48483 Total vcore-seconds taken by all reduce tasks=21661 Total megabyte-seconds taken by all map tasks=49646592 Total megabyte-seconds taken by all reduce tasks=22180864 Map-Reduce Framework Map input records=5 Map output records=10 Map output bytes=125 Map output materialized bytes=96 Input split bytes=141 Combine input records=10 Combine output records=6 Reduce input groups=6 Reduce shuffle bytes=96 Reduce input records=6 Reduce output records=6 Spilled Records=12 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=293 CPU time spent (ms)=2970 Physical memory (bytes) snapshot=313458688 Virtual memory (bytes) snapshot=1680084992 Total committed heap usage (bytes)=136450048 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=85 File Output Format Counters Bytes Written=66 [[email protected] hadoop-2.5.0]$ bin/hdfs dfs -ls /user/liuwl/tmp/output 16/09/14 08:02:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items -rw-r--r-- 1 liuwl supergroup 0 2016-09-14 08:01 /user/liuwl/tmp/output/_SUCCESS -rw-r--r-- 1 liuwl supergroup 66 2016-09-14 08:01 /user/liuwl/tmp/output/part-r-00000 [[email protected] hadoop-2.5.0]$ bin/hdfs dfs -text /user/liuwl/tmp/output/part* 16/09/14 08:02:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable datanode 2 hadoop 2 historyserver 1 mapreduce 2 namenode 1 yarn 2
5. 简述hadoop四大组件原理
1> Hadoop Common:hadoop的公共类,方法,功能 2> Hadoop Distributed File System(hafs) hadoop 分布式 文件系统 架构:主从架构(分工明确,namenode存储从节点信息,datanode存储具体数据) 可靠性: 系统块副本机制(自定义副本个数,坏块就近自动填补,定期校验副本块) 文件系统使用SecondaryNameNode定期合并edit与影像文件 可扩展性: 在集群全有机器基础上可任意添加多台机器 运行原理: 客户端写入文件,告知namenode,namenode存储着datanode以及以前文件的所有信息,分配系统块给予客户端写入 客户端读文件,namenode根据文件信息快速找到文件,采用就近原则,返回给用户 3> Hadoop Yarn:hadoop统一资源管理与任务调度框架 架构:主从架构(ResourceManager与NodeManager) 个人认为,yarn类似javaee中spring框架,作为了一个容器使用 yarn工作流程:客户端提交一个job,ResourceManager中ApplicationManager为job通过NodeManager建立ApplicationMaster用于管理job和反馈信息,ApplicationMaster告知ApplicationManager,所需要的所有正常运行job的资源,包括cpu,内存等,ApplicationManager返回给ApplicationMaster一个container(容器),让job在该容器中运行,其他job无法争夺其中的的资源,起到很好的隔离作用,job运行完毕会将运行信息发回给ApplicationMaster,ApplicationMaster通知ApplicationManager任务运行的情况,并记录job运行历史文件,收回资源等 4> Hadoop MapReduce:MapReduce是一个任务运行工具,每一个map便会开启一个java虚拟机,在MapReduceOnYarn时每个任务通过RPC协议向ApplicationManager报告自己的状态
以上是关于Hadoop.2.x_伪分布环境搭建的主要内容,如果未能解决你的问题,请参考以下文章