沉淀,再出发——在Ubuntu Kylin15.04中配置Hadoop单机/伪分布式系统经验分享
Posted 精心出精品
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了沉淀,再出发——在Ubuntu Kylin15.04中配置Hadoop单机/伪分布式系统经验分享相关的知识,希望对你有一定的参考价值。
在Ubuntu Kylin15.04中配置Hadoop单机/伪分布式系统经验分享
一、工作准备
首先,明确工作的重心,在Ubuntu Kylin15.04中配置Hadoop集群,这里我是用的双系统中的Ubuntu来配制的,不是虚拟机。在网上有很多配置的方案,我看了一下Ubuntu的版本有14.x,16.x等等,唯独缺少15.x,后来我也了解到,15.x出来一段时间就被下一个版本所替代了,可能有一定的问题吧,可是我还是觉得这个版本的用起来很舒服,但是当我安装了Ubuntu kylin15.04之后,网络配置成功,我开始使用sudo apt-get update更新一下软件源的时候,就遇到了非常大的麻烦,具体的介绍可以参考我的拙作Ubuntu版本更替所引发的“血案”,经过了一番斗争,我总算在打算安装16.x之前找到了解决办法,实现了一次技术上的沉淀!之后安装Hadoop集群总算是踏上了高速列车。解决了系统的问题,我们需要使用的原来还有vim或者gedit文本编辑工具,SSH,openssh-server,当然了Ubuntu默认是安装了openssh-client的,我们可以再安装一次,之后需要java的jre和jdk,需要hadoop,基本上需要这么多基本的原料,有了这些东西,我们就可以使用shell来尽情的发挥了。
1、vim或者gedit文本编辑工具;
2、ssh,openssh-server,openssh-client;
3、jre和jdk,这里安装的是openjdk-7-jre openjdk-7-jdk;
4、Hadoop 2.x.y;
5、Ubuntu Kylin15.04;
二、创建hadoop用户
这一步是保证操作的纯洁性,至于是不是必须要以hadoop为用户名,这个地方还有待考证,不过作为初学者,我们就先从最基本的开始理解,主要的操作如下,增加用户名为hadoop,并且使用bash作为shell,之后设置密码,然后为hadoop赋予sudo权限,最后退出原系统,登录我们新创建的系统。
sudo useradd -m hadoop -s /bin/bash
sudo passwd hadoop
sudo adduser hadoop sudo
三、更新apt,并且安装一些工具软件
3.1、到了这里,我们使用新创建的用户登录系统,然后打开shell,在shell中运行如下命令,更新软件源:
sudo apt-get clean
sudo apt-get update
sudo apt-get upgrade
如果中途失败,提示get不到源,或者网络失败,我们的排查思路是,首先ping 公网,看看能不能够连接成功,其次检查DNS,/etc/hosts等,判断是不是域名系统的问题,最后我们使用源的IP来ping,如果都没有问题,那我们的问题可能就在于‘源’已经失去维护了,从以前的仓库中移除了,遇到这个问题,请参考我的拙作Ubuntu版本更替所引发的“血案”,基本上可以解决问题。
3.2、然后更新vim,安装ssh工具,具体操作如下:
sudo apt-get install vim
sudo apt-get install ssh
sudo apt-get install openssh-server
sudo apt-get install openssh-client
安装完成以后测试一下是否能够登陆localhost,自己登陆自己来测试是否可以使用ssh协议。如果不成功,我们启动一下ssh,并且使用ps和grep来看一下是否出现sshd,如果有代表程序启动成功,登录localhost会显示登录的结果,如果提示要更新或什么的不用理会。
ssh localhost
sudo /etc/init.d/ssh start
ps -e | grep ssh
之后我们生成并导出公钥,使得公钥可信任,我们每一次ssh就不用输入密码了。
cd ~/.ssh/
ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
35:1f:b0:20:dc:03:0d:52:00:9b:34:51:7f:95:60:b6 hadoop@zyr-Aspire-V5-551G
cat ./id_rsa.pub >> ./authorized_keys
⦁ hadoop@zyr-Aspire-V5-551G:~$ ssh localhost
⦁ Welcome to Ubuntu 15.04 (GNU/Linux 3.19.0-15-generic x86_64)
⦁
⦁ * Documentation: https://help.ubuntu.com/
⦁
⦁ 15 packages can be updated.
⦁ 9 updates are security updates.
⦁
⦁ Your Ubuntu release is not supported anymore.
⦁ For upgrade information, please visit:
⦁ http://www.ubuntu.com/releaseendoflife
⦁
⦁ New release \'16.04.4 LTS\' available.
⦁ Run \'do-release-upgrade\' to upgrade to it.
⦁
⦁ Last login: Sat Mar 3 10:32:05 2018 from localhost
3.3、安装JAVA环境
在这里我们使用openjdk和openjre,这是非官方的开源的,安装起来更容易,更方便。
sudo apt-get install openjdk-7-jre openjdk-7-jdk
之后我们需要找到这些文件的安装路径:
hadoop@zyr-Aspire-V5-551G:~$ dpkg -L openjdk-7-jdk | grep \'/bin/javac\'
/usr/lib/jvm/java-7-openjdk-amd64/bin/javac
可以看到就是/usr/lib/jvm/java-7-openjdk-amd64安装路径,在这里我们使用的hadoop是2.9.0,java的环境是1.7.x,亲测通过,在官网上有这样的说法,当hadoop版本超过一定的级别的时候(2.7),必须使用java1.7以及之上的版本。之后我们修改环境变量,在开头增加export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/,并且保存退出,然后使用source ~/.bashrc进行更新,通过下面的命令来测试java是否安装成功,环境变量是否匹配,系统是否正在使用我们配置的环境变量等信息。至此,java环境设置完成。
vim ~/.bashrc
hadoop@zyr-Aspire-V5-551G:~$ cat ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
# ~/.bashrc: executed by bash(1) for non-login shells.
# see /usr/share/doc/bash/examples/startup-files (in the package bash-doc)
# for examples
……
source ~/.bashrc
hadoop@zyr-Aspire-V5-551G:~$ echo $JAVA_HOME
/usr/lib/jvm/java-7-openjdk-amd64/
hadoop@zyr-Aspire-V5-551G:~$ java -version
java version "1.7.0_95"
OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-0ubuntu0.15.04.1)
OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)
hadoop@zyr-Aspire-V5-551G:~$ $JAVA_HOME/bin/java -version
java version "1.7.0_95"
OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-0ubuntu0.15.04.1)
OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)
四、安装Hadoop
4.1、下载Hadoop
在官网,或者通过 http://mirror.bit.edu.cn/apache/hadoop/common/ 或者 http://mirrors.cnnic.cn/apache/hadoop/common/ 下载Hadoop的所有版本,一般选择下载最新的稳定版本,下载 “stable” 下的 hadoop-2.x.y.tar.gz 这个格式的文件,我们可以直接使用,简单的解压,并且放到相应的文件夹即可;另一个包含 src 的则是 Hadoop 源代码,需要进行编译才可使用,我们可以拿来作为学习,在后期研究Hadoop的架构,因为Hadoop是用java语言写的,所以通俗易读。另外要保证下载文件的安全性、完整性、可用性、不可否认性、可控性等,最好的是找到一个含有hash校验码的下载源,不过笔者亲测这个下载源是可靠的。通过浏览器下载即可,之后进行保存,记住保存的位置,便于我们后期的操作。在这里笔者使用的是次新版的2.9.0,如下图所示。
4.2、安装Hadoop
下载之后,我们将该压缩文件解压到/usr/local这个文件夹下,其实别的地方也是可以的,但是放在这里见名知意,恰到好处。之后我们进入这个文件夹下,通过mv的重命名功能将版本号去掉,改为hadoop,并且修改该文件夹的权限,使得该文件夹拥有hadoop的权限。并且我们使用ll命令来查看一下local下面的文件布局。
hadoop@zyr-Aspire-V5-551G:~$ sudo tar -zxf ~/Downloads/hadoop-2.9.0.tar.gz -C /usr/local
[sudo] password for hadoop:
hadoop@zyr-Aspire-V5-551G:~$ cd /usr/local/
hadoop@zyr-Aspire-V5-551G:/usr/local$ sudo mv ./hadoop-2.9.0/ ./hadoop
hadoop@zyr-Aspire-V5-551G:/usr/local$ sudo chown -R hadoop ./hadoop
hadoop@zyr-Aspire-V5-551G:/usr/local$ ll
total 44
drwxr-xr-x 11 root root 4096 3月 3 11:07 ./
drwxr-xr-x 10 root root 4096 4月 23 2015 ../
drwxr-xr-x 2 root root 4096 4月 23 2015 bin/
drwxr-xr-x 2 root root 4096 4月 23 2015 etc/
drwxr-xr-x 2 root root 4096 4月 23 2015 games/
drwxr-xr-x 9 hadoop zyr 4096 11月 14 07:28 hadoop/
drwxr-xr-x 2 root root 4096 4月 23 2015 include/
drwxr-xr-x 4 root root 4096 4月 23 2015 lib/
lrwxrwxrwx 1 root root 9 3月 2 20:16 man -> share/man/
drwxr-xr-x 2 root root 4096 4月 23 2015 sbin/
drwxr-xr-x 8 root root 4096 4月 23 2015 share/
drwxr-xr-x 2 root root 4096 4月 23 2015 src/
解压之后就相当于安装了,这点我们要记住,特别的方便,之后我们开始检验一下安装的结果,通过 ./bin/hadoop version命令来判断是否安装成功,如下是安装成功之后的结果。到这里我们总算是安装好了hadoop,其实也并不复杂,但是从无到有的过程,每一步的细节都是非常值得我们注意的。
hadoop@zyr-Aspire-V5-551G:/usr/local$ cd hadoop/
hadoop@zyr-Aspire-V5-551G:/usr/local/hadoop$ ./bin/hadoop version
Hadoop 2.9.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 756ebc8394e473ac25feac05fa493f6d612e6c50
Compiled by arsuresh on 2017-11-13T23:15Z
Compiled with protoc 2.5.0
From source with checksum 0a76a9a32a5257331741f8d5932f183
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.9.0.jar
五、单机Hadoop测试
到了这里,我们其实只是完成了单机上的Hadoop的安装,但是这些步骤在分布式上面是一样的,需要勤加练习,这样的Hadoop系统远不是集群系统,但是却迈出了关键性的一步,因为在一些学术研究中,到了这里我们就可以开发map reduce程序了,如果程序不是非常复杂,我们在单机上就可以完成,值得喜悦的是在Hadoop的安装包中早就集成了一些样例,我们可以通过这些样例来测试一下我们的Hadoop,比如WordCount、GREP 【正则表达式】等等,但是在我们兴奋之前,需要认识到,我们这样的程序并没有用到HDFS,而是使用的我们OS自带的文件系统FS,但是至少说这是一个里程碑。
我们首先切换到相关目录,然后创建一个input文件夹(名字无特殊要求),然后将一些文件放进去,这里我们放入的是一些配置文件来作为数据源,并且通过Hadoop自带的样例程序来测试一下我们的安装是不是成功的。
cd /usr/local/hadoop
mkdir ./input
cp ./etc/hadoop/*.xml ./input
hadoop@zyr-Aspire-V5-551G:/usr/local/hadoop$ ls ./etc/hadoop/
capacity-scheduler.xml httpfs-env.sh mapred-env.sh
configuration.xsl httpfs-log4j.properties mapred-queues.xml.template
container-executor.cfg httpfs-signature.secret mapred-site.xml.template
core-site.xml httpfs-site.xml slaves
hadoop-env.cmd kms-acls.xml ssl-client.xml.example
hadoop-env.sh kms-env.sh ssl-server.xml.example
hadoop-metrics2.properties kms-log4j.properties yarn-env.cmd
hadoop-metrics.properties kms-site.xml yarn-env.sh
hadoop-policy.xml log4j.properties yarn-site.xml
hdfs-site.xml mapred-env.cmd
我们使用如下命令来测试我们的程序,首先我们可以执行一下./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar来看一下我们有哪些样例程序,然后我们使用其中的grep程序来从所有的输入文件中统计满足\'dfs[a-z.]+\'正则表达式的单词的个数是多少。
hadoop@zyr-Aspire-V5-551G:/usr/local/hadoop$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.0.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
Eg:
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.0.jar wordcount ./input ./output
真正MapReduce命令:
hadoop@zyr-Aspire-V5-551G:/usr/local/hadoop$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep ./input ./output \'dfs[a-z.]+\'
执行的结果是喜人的,我在这里将结果贴出来,但因为太长了,所以就缩进了。
1 hadoop@zyr-Aspire-V5-551G:/usr/local/hadoop$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep ./input ./output \'dfs[a-z.]+\' 2 18/03/03 11:20:28 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 3 18/03/03 11:20:28 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 4 18/03/03 11:20:28 INFO input.FileInputFormat: Total input files to process : 8 5 18/03/03 11:20:28 INFO mapreduce.JobSubmitter: number of splits:8 6 18/03/03 11:20:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local325822439_0001 7 18/03/03 11:20:31 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 8 18/03/03 11:20:31 INFO mapreduce.Job: Running job: job_local325822439_0001 9 18/03/03 11:20:31 INFO mapred.LocalJobRunner: OutputCommitter set in config null 10 18/03/03 11:20:31 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 11 18/03/03 11:20:31 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 12 18/03/03 11:20:31 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 13 18/03/03 11:20:31 INFO mapred.LocalJobRunner: Waiting for map tasks 14 18/03/03 11:20:31 INFO mapred.LocalJobRunner: Starting task: attempt_local325822439_0001_m_000000_0 15 18/03/03 11:20:31 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 16 18/03/03 11:20:31 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 17 18/03/03 11:20:31 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 18 18/03/03 11:20:31 INFO mapred.MapTask: Processing split: file:/usr/local/hadoop/input/hadoop-policy.xml:0+10206 19 18/03/03 11:20:31 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 20 18/03/03 11:20:31 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 21 18/03/03 11:20:31 INFO mapred.MapTask: soft limit at 83886080 22 18/03/03 11:20:31 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 23 18/03/03 11:20:31 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 24 18/03/03 11:20:31 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 25 18/03/03 11:20:31 INFO mapred.LocalJobRunner: 26 18/03/03 11:20:31 INFO mapred.MapTask: Starting flush of map output 27 18/03/03 11:20:31 INFO mapred.MapTask: Spilling map output 28 18/03/03 11:20:31 INFO mapred.MapTask: bufstart = 0; bufend = 17; bufvoid = 104857600 29 18/03/03 11:20:31 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214396(104857584); length = 1/6553600 30 18/03/03 11:20:32 INFO mapred.MapTask: Finished spill 0 31 18/03/03 11:20:32 INFO mapred.Task: Task:attempt_local325822439_0001_m_000000_0 is done. And is in the process of committing 32 18/03/03 11:20:32 INFO mapred.LocalJobRunner: map 33 18/03/03 11:20:32 INFO mapred.Task: Task \'attempt_local325822439_0001_m_000000_0\' done. 34 18/03/03 11:20:32 INFO mapred.LocalJobRunner: Finishing task: attempt_local325822439_0001_m_000000_0 35 18/03/03 11:20:32 INFO mapred.LocalJobRunner: Starting task: attempt_local325822439_0001_m_000001_0 36 18/03/03 11:20:32 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 37 18/03/03 11:20:32 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 38 18/03/03 11:20:32 INFO mapreduce.Job: Job job_local325822439_0001 running in uber mode : false 39 18/03/03 11:20:32 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 40 18/03/03 11:20:32 INFO mapred.MapTask: Processing split: file:/usr/local/hadoop/input/capacity-scheduler.xml:0+7861 41 18/03/03 11:20:32 INFO mapreduce.Job: map 100% reduce 0% 42 18/03/03 11:20:32 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 43 18/03/03 11:20:32 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 44 18/03/03 11:20:32 INFO mapred.MapTask: soft limit at 83886080 45 18/03/03 11:20:32 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 46 18/03/03 11:20:32 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 47 18/03/03 11:20:32 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 48 18/03/03 11:20:32 INFO mapred.LocalJobRunner: 49 18/03/03 11:20:32 INFO mapred.MapTask: Starting flush of map output 50 18/03/03 11:20:32 INFO mapred.Task: Task:attempt_local325822439_0001_m_000001_0 is done. And is in the process of committing 51 18/03/03 11:20:32 INFO mapred.LocalJobRunner: map 52 18/03/03 11:20:32 INFO mapred.Task: Task \'attempt_local325822439_0001_m_000001_0\' done. 53 18/03/03 11:20:32 INFO mapred.LocalJobRunner: Finishing task: attempt_local325822439_0001_m_000001_0 54 18/03/03 11:20:32 INFO mapred.LocalJobRunner: Starting task: attempt_local325822439_0001_m_000002_0 55 18/03/03 11:20:32 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 56 18/03/03 11:20:32 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 57 18/03/03 11:20:32 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 58 18/03/03 11:20:32 INFO mapred.MapTask: Processing split: file:/usr/local/hadoop/input/kms-site.xml:0+5939 59 18/03/03 11:20:32 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 60 18/03/03 11:20:32 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 61 18/03/03 11:20:32 INFO mapred.MapTask: soft limit at 83886080 62 18/03/03 11:20:32 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 63 18/03/03 11:20:32 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 64 18/03/03 11:20:32 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 65 18/03/03 11:20:32 INFO mapred.LocalJobRunner: 66 18/03/03 11:20:32 INFO mapred.MapTask: Starting flush of map output 67 18/03/03 11:20:33 INFO mapred.Task: Task:attempt_local325822439_0001_m_000002_0 is done. And is in the process of committing 68 18/03/03 11:20:33 INFO mapred.LocalJobRunner: map 69 18/03/03 11:20:33 INFO mapred.Task: Task \'attempt_local325822439_0001_m_000002_0\' done. 70 18/03/03 11:20:33 INFO mapred.LocalJobRunner: Finishing task: attempt_local325822439_0001_m_000002_0 71 18/03/03 11:20:33 INFO mapred.LocalJobRunner: Starting task: attempt_local325822439_0001_m_000003_0 72 18/03/03 11:20:33 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 73 18/03/03 11:20:33 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 74 18/03/03 11:20:33 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 75 18/03/03 11:20:33 INFO mapred.MapTask: Processing split: file:/usr/local/hadoop/input/kms-acls.xml:0+3518 76 18/03/03 11:20:33 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 77 18/03/03 11:20:33 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 78 18/03/03 11:20:33 INFO mapred.MapTask: soft limit at 83886080 79 18/03/03 11:20:33 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 80 18/03/03 11:20:33 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 81 18/03/03 11:20:33 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 82 18/03/03 11:20:33 INFO mapred.LocalJobRunner: 83 18/03/03 11:20:33 INFO mapred.MapTask: Starting flush of map output 84 18/03/03 11:20:33 INFO mapreduce.Job: map 38% reduce 0% 85 18/03/03 11:20:33 INFO mapred.Task: Task:attempt_local325822439_0001_m_000003_0 is done. And is in the process of committing 86 18/03/03 11:20:33 INFO mapred.LocalJobRunner: map 87 18/03/03 11:20:33 INFO mapred.Task: Task \'attempt_local325822439_0001_m_000003_0\' done. 88 18/03/03 11:20:33 INFO mapred.LocalJobRunner: Finishing task: attempt_local325822439_0001_m_000003_0 89 18/03/03 11:20:33 INFO mapred.LocalJobRunner: Starting task: attempt_local325822439_0001_m_000004_0 90 18/03/03 11:20:33 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 91 18/03/03 11:20:33 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 92 18/03/03 11:20:33 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 93 18/03/03 11:20:33 INFO mapred.MapTask: Processing split: file:/usr/local/hadoop/input/hdfs-site.xml:0+775 94 18/03/03 11:20:33 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 95 18/03/03 11:20:33 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 96 18/03/03 11:20:33 INFO mapred.MapTask: soft limit at 83886080 97 18/03/03 11:20:33 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 98 18/03/03 11:20:33 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 99 18/03/03 11:20:33 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 100 18/03/03 11:20:33 INFO mapred.LocalJobRunner: 101 18/03/03 11:20:33 INFO mapred.MapTask: Starting flush of map output 102 18/03/03 11:20:33 INFO mapred.Task: Task:attempt_local325822439_0001_m_000004_0 is done. And is in the process of committing 103 18/03/03 11:20:33 INFO mapred.LocalJobRunner: map 104 18/03/03 11:20:33 INFO mapred.Task: Task \'attempt_local325822439_0001_m_000004_0\' done. 105 18/03/03 11:20:33 INFO mapred.LocalJobRunner: Finishing task: attempt_local325822439_0001_m_000004_0 106 18/03/03 11:20:33 INFO mapred.LocalJobRunner: Starting task: attempt_local325822439_0001_m_000005_0 107 18/03/03 11:20:33 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 108 18/03/03 11:20:33 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 109 18/03/03 11:20:33 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 110 18/03/03 11:20:33 INFO mapred.MapTask: Processing split: file:/usr/local/hadoop/input/core-site.xml:0+774 111 18/03/03 11:20:33 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 112 18/03/03 11:20:33 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 113 18/03/03 11:20:33 INFO mapred.MapTask: soft limit at 83886080 114 18/03/03 11:20:33 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 115 18/03/03 11:20:33 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 116 18/03/03 11:20:33 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 117 18/03/03 11:20:33 INFO mapred.LocalJobRunner: 118 18/03/03 11:20:33 INFO mapred.MapTask: Starting flush of map output 119 18/03/03 11:20:33 INFO mapred.Task: Task:attempt_local325822439_0001_m_000005_0 is done. And is in the process of committing 120 18/03/03 11:20:33 INFO mapred.LocalJobRunner: map 121 18/03/03 11:20:33 INFO mapred.Task: Task \'attempt_local325822439_0001_m_000005_0\' done. 122 18/03/03 11:20:33 INFO mapred.LocalJobRunner: Finishing task: attempt_local325822439_0001_m_000005_0 123 18/03/03 11:20:33 INFO mapred.LocalJobRunner: Starting task: attempt_local325822439_0001_m_000006_0 124 18/03/03 11:20:33 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 125 18/03/03 11:20:33 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 126 18/03/03 11:20:33 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 127 18/03/03 11:20:33 INFO mapred.MapTask: Processing split: file:/usr/local/hadoop/input/yarn-site.xml:0+690 128 18/03/03 11:20:33 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 129 18/03/03 11:20:33 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 130 18/03/03 11:20:33 INFO mapred.MapTask: soft limit at 83886080 131 18/03/03 11:20:33 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 132 18/03/03 11:20:33 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 133 18/03/03 11:20:33 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 134 18/03/03 11:20:33 INFO mapred.LocalJobRunner: 135 18/03/03 11:20:33 INFO mapred.MapTask: Starting flush of map output 136 18/03/03 11:20:34 INFO mapred.Task: Task:attempt_local325822439_0001_m_000006_0 is done. And is in the process of committing 137 18/03/03 11:20:34 INFO mapred.LocalJobRunner: map 138 18/03/03 11:20:34 INFO mapred.Task: Task \'attempt_local325822439_0001_m_000006_0\' done. 139 18/03/03 11:20:34 INFO mapred.LocalJobRunner: Finishing task: attempt_local325822439_0001_m_000006_0 140 18/03/03 11:20:34 INFO mapred.LocalJobRunner: Starting task: attempt_local325822439_0001_m_000007_0 141 18/03/03 11:20:34 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 142 18/03/03 11:20:34 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 143 18/03/03 11:20:34 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 144 18/03/03 11:20:34 INFO mapred.MapTask: Processing split: file:/usr/local/hadoop/input/httpfs-site.xml:0+620 145 18/03/03 11:20:34 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 146 18/03/03 11:20:34 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 147 18/03/03 11:20:34 INFO mapred.MapTask: soft limit at 83886080 148 18/03/03 11:20:34 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 149 18/03/03 11:20:34 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 150 18/03/03 11:20:34 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 151 18/03/03 11:20:34 INFO mapred.LocalJobRunner: 152 18/03/03 11:20:34 INFO mapred.MapTask: Starting flush of map output 153 18/03/03 11:20:34 INFO mapred.Task: Task:attempt_local325822439_0001_m_000007_0 is done. And is in the process of committing 154 18/03/03 11:20:34 INFO mapred.LocalJobRunner: map 155 18/03/03 11:20:34 INFO mapred.Task: Task \'attempt_local325822439_0001_m_000007_0\' done. 156 18/03/03 11:20:34 INFO mapred.LocalJobRunner: Finishing task: attempt_local325822439_0001_m_000007_0 157 18/03/03 11:20:34 INFO mapred.LocalJobRunner: map task executor complete. 158 18/03/03 11:20:34 INFO mapred.LocalJobRunner: Waiting for reduce tasks 159 18/03/03 11:20:34 INFO mapred.LocalJobRunner: Starting task: attempt_local325822439_0001_r_000000_0 160 18/03/03 11:20:34 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 161 18/03/03 11:20:34 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 162 18/03/03 11:20:34 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 163 18/03/03 11:20:34 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@362850fb 164 18/03/03 11:20:34 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=369937600, maxSingleShuffleLimit=92484400, mergeThreshold=244158832, iosortFactor=10, memToMemMergeOutputsThreshold=10 165 18/03/03 11:20:34 INFO reduce.EventFetcher: attempt_local325822439_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events 166 18/03/03 11:20:34 INFO mapreduce.Job: map 100% reduce 0% 167 18/03/03 11:20:34 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local325822439_0001_m_000003_0 decomp: 2 len: 6 to MEMORY 168 18/03/03 11:20:34 INFO reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_local325822439_0001_m_000003_0 169 18/03/03 11:20:34 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->2 170 18/03/03 11:20:34 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local325822439_0001_m_000000_0 decomp: 21 len: 25 to MEMORY 171 18/03/03 11:20:34 INFO reduce.InMemoryMapOutput: Read 21 bytes from map-output for attempt_local325822439_0001_m_000000_0 172 18/03/03 11:20:34 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 21, inMemoryMapOutputs.size() -> 2, commitMemory -> 2, usedMemory ->23 173 18/03/03 11:20:34 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local325822439_0001_m_000006_0 decomp: 2 len: 6 to MEMORY 174 18/03/03 11:20:34 INFO reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_local325822439_0001_m_000006_0 175 18/03/03 11:20:34 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 3, commitMemory -> 23, usedMemory ->25 176 18/03/03 11:20:34 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local325822439_0001_m_000005_0 decomp: 2 len: 6 to MEMORY 177 18/03/03 11:20:34 INFO reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_local325822439_0001_m_000005_0 178 18/03/03 11:20:34 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 4, commitMemory -> 25, usedMemory ->27 179 18/03/03 11:20:34 WARN io.ReadaheadPool: Failed readahead on ifile 180 EBADF: Bad file descriptor 181 at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method) 182 at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267) 183 at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146) 184 at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:208) 185 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 186 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 187 at java.lang.Thread.run(Thread.java:745) 188 18/03/03 11:20:34 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local325822439_0001_m_000001_0 decomp: 2 len: 6 to MEMORY 189 18/03/03 11:20:34 INFO reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_local325822439_0001_m_000001_0 190 18/03/03 11:20:34 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 5, commitMemory -> 27, usedMemory ->29 191 18/03/03 11:20:34 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local325822439_0001_m_000004_0 decomp: 2 len: 6 to MEMORY 192 18/03/03 11:20:34 INFO reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_local325822439_0001_m_000004_0 193 18/03/03 11:20:34 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 6, commitMemory -> 29, usedMemory ->31 194 18/03/03 11:20:34 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local325822439_0001_m_000007_0 decomp: 2 len: 6 to MEMORY 195 18/03/03 11:20:34 INFO reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_local325822439_0001_m_000007_0 196 18/03/03 11:20:34 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 7, commitMemory -> 31, usedMemory ->33 197 18/03/03 11:20:34 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local325822439_0001_m_000002_0 decomp: 2 len: 6 to MEMORY 198 18/03/03 11:20:34 INFO reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_local325822439_0001_m_000002_0 199 18/03/03 11:20:34 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 8, commitMemory -> 33, usedMemory ->35 200 18/03/03 11:20:34 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning 201 18/03/03 11:20:34 INFO mapred.LocalJobRunner: 8 / 8 copied. 202 18/03/03 11:20:34 INFO reduce.MergeManagerImpl: finalMerge called with 8 in-memory map-outputs and 0 on-disk map-outputs 203 18/03/03 11:20:34 INFO mapred.Merger: Merging 8 sorted segments 204 18/03/03 11:20:34 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 10 bytes 205 18/03/03 11:20:34 INFO reduce.MergeManagerImpl: Merged 8 segments, 35 bytes to disk to satisfy reduce memory limit 206 18/03/03 11:20:34 INFO reduce.MergeManagerImpl: Merging 1 files, 25 bytes from disk 207 18/03/03 11:20:34 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce 208 18/03/03 11:20:34 INFO mapred.Merger: Merging 1 sorted segments 209 18/03/03 11:20:34 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 10 bytes 210 18/03/03 11:20:34 INFO mapred.LocalJobRunner: 8 / 8 copied. 211 18/03/03 11:20:34 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords 212 18/03/03 11:20:34 INFO mapred.Task: Task:attempt_local325822439_0001_r_000000_0 is done. And is in the process of committing 213 18/03/03 11:20:34 INFO mapred.LocalJobRunner: 8 / 8 copied. 214 18/03/03 11:20:34 INFO mapred.Task: Task attempt_local325822439_0001_r_000000_0 is allowed to commit now 215 18/03/03 11:20:34 INFO output.FileOutputCommitter: Saved output of task \'attempt_local325822439_0001_r_000000_0\' to file:/usr/local/hadoop/grep-temp-876870354/_temporary/0/task_local325822439_0001_r_000000 216 18/03/03 11:20:34 INFO mapred.LocalJobRunner: reduce > reduce 217 18/03/03 11:20:34 INFO mapred.Task: Task \'attempt_lo以上是关于沉淀,再出发——在Ubuntu Kylin15.04中配置Hadoop单机/伪分布式系统经验分享的主要内容,如果未能解决你的问题,请参考以下文章