Windows下Eclipse提交MR程序到HadoopCluster
Posted Syn良子
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Windows下Eclipse提交MR程序到HadoopCluster相关的知识,希望对你有一定的参考价值。
作者:Syn良子 出处:http://www.cnblogs.com/cssdongl 欢迎转载,转载请注明出处.
以前Eclipse上写好的MapReduce项目经常是打好包上传到Hadoop测试集群来直接运行,运行遇到问题的话查看日志和修改相关代码来解决。找时间配置了Windows上Eclispe远程提交MR程序到集群方便调试.记录一些遇到的问题和解决方法.
系统环境:Windows7 64,Eclipse Mars,Maven3.3.9,Hadoop2.6.0-CDH5.4.0.
一.配置MapReduce Maven工程
新建一个Maven工程,将CDH集群的相关xml配置文件(主要是core-site.xml,hdfs-site.xml,mapred-site.xml和yarn-site.xml)拷贝到src/main/java下,因为需要连接的是CDH集群,所以配置pom.xml文件主要内容如下
<properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> </properties> <repositories> <repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository> </repositories> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.6.0-cdh5.4.0</version> <type>jar</type> <exclusions> <exclusion> <artifactId>kfs</artifactId> <groupId>net.sf.kosmosfs</groupId> </exclusion> </exclusions> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.6.0-cdh5.4.0</version> <type>jar</type> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs-nfs</artifactId> <version>2.6.0-cdh5.4.0</version> <type>jar</type> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-yarn-api</artifactId> <version>2.6.0-cdh5.4.0</version> <type>jar</type> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-yarn-applications-distributedshell</artifactId> <version>2.6.0-cdh5.4.0</version> <type>jar</type> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-yarn-server-resourcemanager</artifactId> <version>2.6.0-cdh5.4.0</version> <type>jar</type> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-yarn-applications-unmanaged-am-launcher</artifactId> <version>2.6.0-cdh5.4.0</version> <type>jar</type> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hbase-hadoop2-compat</artifactId> <version>1.0.0-cdh5.4.0</version> <type>jar</type> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-common</artifactId> <version>2.6.0-cdh5.4.0</version> <type>jar</type> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-jobclient</artifactId> <version>2.6.0-cdh5.4.0</version> <type>jar</type> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-app</artifactId> <version>2.6.0-cdh5.4.0</version> <type>jar</type> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-hs</artifactId> <version>2.6.0-cdh5.4.0</version> <type>jar</type> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-hs-plugins</artifactId> <version>2.6.0-cdh5.4.0</version> <type>jar</type> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hbase-client</artifactId> <version>1.0.0-cdh5.4.0</version> <type>jar</type> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hbase-common</artifactId> <version>1.0.0-cdh5.4.0</version> <type>jar</type> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hbase-server</artifactId> <version>1.0.0-cdh5.4.0</version> <type>jar</type> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hbase-protocol</artifactId> <version>1.0.0-cdh5.4.0</version> <type>jar</type> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hbase-prefix-tree</artifactId> <version>1.0.0-cdh5.4.0</version> <type>jar</type> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hbase-hadoop-compat</artifactId> <version>1.0.0-cdh5.4.0</version> <type>jar</type> <scope>provided</scope> </dependency> <dependency> <groupId>jdk.tools</groupId> <artifactId>jdk.tools</artifactId> <version>1.7</version> <scope>system</scope> <systemPath>${JAVA_HOME}/lib/tools.jar</systemPath> </dependency> <dependency> <groupId>org.apache.maven.surefire</groupId> <artifactId>surefire-booter</artifactId> <version>2.12.4</version> </dependency> </dependencies>
如果CDH是其他版本,请参考CDH官方Maven Artifacts,配置好对应的dependency(修改version之类的属性).如果是原生Hadoop,remove掉上面Cloudera的repositroy,配置好对应的dependency.
配置好以后保存pom文件,等待相关jar包下载完成.
二.配置Eclipse提交MR到集群
最简单的莫过于WordCount了,贴代码先.
package org.ldong.test; import java.io.File; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; import org.apache.hadoop.yarn.conf.YarnConfiguration; public class WordCount1 extends Configured implements Tool { public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { System.setProperty("hadoop.home.dir", "C:\\\\hadoop-2.6.0"); ToolRunner.run(new WordCount1(), args); } public int run(String[] args) throws Exception { String input = "hdfs://littleNameservice/test/input"; String output = "hdfs://littleNameservice/test/output"; Configuration conf = new YarnConfiguration(); conf.addResource("core-site.xml"); conf.addResource("hdfs-site.xml"); conf.addResource("mapred-site.xml"); conf.addResource("yarn-site.xml"); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount1.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setNumReduceTasks(1); FileInputFormat.addInputPath(job, new Path(input)); FileSystem fs = FileSystem.get(conf); if (fs.exists(new Path(output))) { fs.delete(new Path(output), true); } String classDirToPackage = "D:\\\\workspace\\\\performance-statistics-mvn\\\\target\\\\classes"; File jarFile = EJob.createTempJar(classDirToPackage); ClassLoader classLoader = EJob.getClassLoader(); Thread.currentThread().setContextClassLoader(classLoader); ((JobConf) job.getConfiguration()).setJar(jarFile.toString()); FileOutputFormat.setOutputPath(job, new Path(output)); return job.waitForCompletion(true) ? 0 : 1; } }
ok,这里主要配置好代码中标红部分的configuration(和自己的连接集群保持一致),开始运行该程序,直接run as java application ,不出意外的话,肯定报错如下
java.io.IOException: Could not locate executable null\\bin\\winutils.exe in the Hadoop binaries.
这个错误很常见,在windows下运行MR就会出现,主要是hadoop需要的一些native library在windows上找不到.作为强迫症患者肯定不能视而不见了,解决方式如下:
1.去官网下载hadoop2.6.0-.tar.gz解压到windows本地,配置环境变量,指向解压以后的hadoop根目录下bin文件夹;
2.将本文最后链接中的压缩包下载解压,将其中的文件都解压到步骤1中hadoop的bin目录下.
3.如果没有立刻生效的,加上临时代码,如上面代码中 System.setProperty("hadoop.home.dir", "C:\\\\hadoop-2.6.0"),下次重启生效后好可以去掉这行
重新运行,该错误消失.出现另外一个错误
其实这个异常是Hadoop2.4之前的一个windows提交MR任务到Hadoop Cluster的严重bug,跟不同系统的classes path有关系,已经在hadoop2.4修复了,见如下链接
http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/releasenotes.html
其中 MAPREDUCE-4052 :“Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster.”
如果有同学是Hadoop2.4之前版本的,解决方式参考下面几个链接(其实4052和5655是duplicate的):
https://issues.apache.org/jira/browse/MAPREDUCE-4052
https://issues.apache.org/jira/browse/MAPREDUCE-5655
https://issues.apache.org/jira/browse/YARN-1298
http://www.aboutyun.com/thread-8498-1-1.html
http://blog.csdn.net/fansy1990/article/details/22896249
而我的环境是Hadoop2.6.0-CDH5.4.0的,解决方式没有上面这么麻烦,直接修改工程下mapred-site.xml文件,修改属性如下(如没有则添加)
<property> <name>mapreduce.app-submission.cross-platform</name> <value>true</value> </property>
继续运行,上面那个”no job control”错误消失,出现如下错误
没完没了了,肯定还是配置的问题,最后查阅资料发现还是直接修改工程的配置文件
对于mapred-site.xml,添加或者修改属性如下,注意自己的集群保持一一致,如下
<property> <name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$MR2_CLASSPATH</value> </以上是关于Windows下Eclipse提交MR程序到HadoopCluster的主要内容,如果未能解决你的问题,请参考以下文章客户端用java api 远程操作HDFS以及远程提交MR任务(源码和异常处理)
windows下eclipse远程连接hadoop集群开发mapreduce
usaco提交程序显示Execution error: Your program had this runtime error: Bad syscall #32000174