大数据:从入门到XX
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了大数据:从入门到XX相关的知识,希望对你有一定的参考价值。
Hadoop Local (Standalone) Mode(单机版)的安装纯粹只是为练手,练完了单机版,下面该练练Pseudo-Distributed Mode(伪分布式版)的安装了。伪分布式是在一台物理机上模拟实现了hadoop的全部功能。包括ssh访问、hdfs格式化、mapReduce执行、yarn资源管理等,伪分布式安装是单机版安装的继续,部分内容依赖于单机版的安装情况。
1、首先确认在redhat6.4上有没有安装SSH。
[[email protected] ~]# rpm -qa|grep ssh openssh-askpass-5.3p1-81.el6.x86_64 trilead-ssh2-213-6.2.el6.noarch openssh-clients-5.3p1-81.el6.x86_64 ksshaskpass-0.5.1-4.1.el6.x86_64 openssh-server-5.3p1-81.el6.x86_64 libssh2-1.2.2-7.el6_2.3.x86_64 openssh-5.3p1-81.el6.x86_64 |
2、确认有没有安装 rsync
[[email protected] ~]# rpm -qa|grep rsync rsync-3.0.6-9.el6.x86_64 |
3、执行命令,测试ssh是否能够通过无密码访问
[[email protected] ~]$ ssh localhost The authenticity of host ‘localhost (::1)‘ can‘t be established. RSA key fingerprint is 05:9e:ac:46:24:aa:c1:45:be:f6:55:83:10:6d:45:6d. Are you sure you want to continue connecting (yes/no)? |
说明:如果每次都需要输入密码,则说明没有配置公钥、私钥。
4、配置ssh,生成公钥、私钥
[[email protected] ~]$ ssh-keygen -t dsa -P ‘‘ -f ~/.ssh/id_dsa Generating public/private dsa key pair.Your identification has been saved in /home/hadoop/.ssh/id_dsa. Your public key has been saved in /home/hadoop/.ssh/id_dsa.pub. The key fingerprint is: d4:fc:32:6f:5c:d6:5a:47:89:8a:9d:79:d1:b5:51:14 [email protected] The key‘s randomart image is: +--[ DSA 1024]----+ | E*| | o o =| | . o o +.| | . o.+ .o | | S.o=..o +| | =.o o.| | + . | | . | | | +-----------------+ 执行下面的命令合并公钥。 [[email protected] ~]$ cat ~/.ssh/id_dsa.pub >>~/.ssh/authorized_keys 执行下面的命令修改公钥文件模式。 [[email protected] .ssh]$ chmod 644 authorized_keys 这里需要说明一下,官方文档是基于ubuntu做的说明,要求执行chmod 0660 ~/.ssh/authorized_keys,但是在redhat6.4上肯定只能执行chmod 644 authorized_keys,否则会出错。 |
5、在配置文件中设置JAVA_HOME
[[email protected] ~]$ vi hadoop-2.7.2/etc/hadoop/hadoop-env.sh # set to the root of your Java installation |
6、配置core-site.xml
vi hadoop-2.7.2/etc/hadoop/core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration> |
7、配置hdfs-site.xml
vi hadoop-2.7.2/etc/hadoop/hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> |
8、格式化namenode
[[email protected] hadoop-2.7.2]$ bin/hdfs namenode -format 16/03/12 19:21:50 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.7.2 。。。 |
9、启动hdfs
[[email protected] sbin]$ start-dfs.sh 16/03/12 20:04:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [localhost] localhost: starting namenode, logging to /home/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-namenode-localhost.localdomain.out localhost: starting datanode, logging to /home/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-datanode-localhost.localdomain.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-secondarynamenode-localhost.localdomain.out 16/03/12 20:04:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable |
10、确认是否能成功访问hdfs的web页面
http://localhost:50070/ |
11、将本地文件导入hdfs中,测试mapReduce演示程序
[[email protected] sbin]$ hdfs dfs -mkdir /user 16/03/12 20:46:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [[email protected] hadoop-2.7.2]$ hdfs dfs -put ./etc/hadoop/ /user [[email protected] hadoop-2.7.2]$ hadoop jar ~/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep /user/hadoop output ‘de[a-z.]+‘ 说明:下面这条语句是在hdfs上,直接查看输出 [[email protected] sbin]$ hdfs dfs -cat /user/hadoop/output/* 说明:下面这条语句是将输出内容从hdfs中拷贝到本地文件夹下 [[email protected] output]$ hdfs dfs -get /user/hadoop/output output 说明:查看本地文件夹下的内容 160 description 128 der 63 der. 31 default 。。。 。。。 |
说明一下,上面的语句创建“/user”目录如果失败,可能是因为目录启用了保护模式,需要先执行以下命令:[[email protected] sbin]$ hadoop dfsadmin -safemode leave
12、停止hdfs
[[email protected] sbin]$ stop-dfs.sh 16/03/12 20:09:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Stopping namenodes on [localhost] localhost: stopping namenode localhost: stopping datanode Stopping secondary namenodes [0.0.0.0] 0.0.0.0: stopping secondarynamenode 16/03/12 20:09:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable |
13、在单节点上启用YARN,配置mapred-site.xml
在hadoop2.7.2发布版中,没有找到mapred-site.xml文件,所以直接从模板复制过来一份。 [[email protected] sbin]$ cp mapred-site.xml.template mapred-site.xml vi etc/hadoop/mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> |
14、配置yarn-site.xml
vi etc/hadoop/yarn-site.xml <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration> |
15、启动yarn
[[email protected] sbin]$ start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/hadoop/hadoop-2.7.2/logs/yarn-hadoop-resourcemanager-localhost.localdomain.out localhost: starting nodemanager, logging to /home/hadoop/hadoop-2.7.2/logs/yarn-hadoop-nodemanager-localhost.localdomain.out |
16、访问yarn的web页面
http://localhost:8088/ |
17、停止yarn
[[email protected] sbin]$ stop-yarn.sh stopping yarn daemons stopping resourcemanager localhost: stopping nodemanager no proxyserver to stop |
以上是伪分布式hadoop的安装,整个过程基本遵照hadoop官方文档执行,在执行过程中,如果碰到其它问题,大多都是由于操作系统引起的,比如系统软件的安装、网络配置等情况。
本文出自 “沈进群” 博客,谢绝转载!
以上是关于大数据:从入门到XX的主要内容,如果未能解决你的问题,请参考以下文章