如何在win7下的eclipse中调试Hadoop2.2.0的程序

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何在win7下的eclipse中调试Hadoop2.2.0的程序相关的知识,希望对你有一定的参考价值。

下面开始进入正题:

序号 名称 描述

1 eclipse Juno Service Release 4.2的本

2 操作系统 Windows7

3 hadoop的eclipse插件 hadoop-eclipse-plugin-2.2.0.jar

4 hadoop的集群环境 虚拟机Linux的Centos6.5单机伪分布式

5 调试程序 Hellow World
遇到的几个问题如下:
java.io.IOException: Could not locate executable null\\bin\\winutils.exe in the Hadoop binaries.

解决办法:

在org.apache.hadoop.util.Shell类的checkHadoopHome()方法的返回值里写固定的

本机hadoop的路径,散仙在这里更改如下:

private static String checkHadoopHome()

// first check the Dflag hadoop.home.dir with JVM scope
//System.setProperty("hadoop.home.dir", "...");
String home = System.getProperty("hadoop.home.dir");

// fall back to the system/user-global env variable
if (home == null)
home = System.getenv("HADOOP_HOME");


try
// couldn\'t find either setting for hadoop\'s home directory
if (home == null)
throw new IOException("HADOOP_HOME or hadoop.home.dir are not set.");


if (home.startsWith("\\"") && home.endsWith("\\""))
home = home.substring(1, home.length()-1);


// check that the home setting is actually a directory that exists
File homedir = new File(home);
if (!homedir.isAbsolute() || !homedir.exists() || !homedir.isDirectory())
throw new IOException("Hadoop home directory " + homedir
+ " does not exist, is not a directory, or is not an absolute path.");


home = homedir.getCanonicalPath();

catch (IOException ioe)
if (LOG.isDebugEnabled())
LOG.debug("Failed to detect a valid hadoop home directory", ioe);

home = null;

//固定本机的hadoop地址
home="D:\\\\hadoop-2.2.0";
return home;


第二个异常,Could not locate executable D:\\Hadoop\\tar\\hadoop-2.2.0\\hadoop-2.2.0\\bin\\winutils.exe in the Hadoop binaries. 找不到win上的执行程序,可以去https://github.com/srccodes/hadoop-common-2.2.0-bin下载bin包,覆盖本机的hadoop跟目录下的bin包即可

第三个异常:
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://192.168.130.54:19000/user/hmail/output/part-00000, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
at com.netease.hadoop.HDFSCatWithAPI.main(HDFSCatWithAPI.java:23)

出现这个异常,一般是HDFS的路径写的有问题,解决办法,拷贝集群上的core-site.xml和hdfs-site.xml文件,放在eclipse的src根目录下即可。

第四个异常:

Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

出现这个异常,一般是由于HADOOP_HOME的环境变量配置的有问题,在这里散仙特别说明一下,如果想在Win上的eclipse中成功调试Hadoop2.2,就需要在本机的环境变量上,添加如下的环境变量:

(1)在系统变量中,新建HADOOP_HOME变量,属性值为D:\\hadoop-2.2.0.也就是本机对应的hadoop目录

(2)在系统变量的Path里,追加%HADOOP_HOME%/bin即可

以上的问题,是散仙在测试遇到的,经过对症下药,我们的eclipse终于可以成功的调试MR程序了,散仙这里的Hellow World源码如下:

package com.qin.wordcount;

import java.io.IOException;

import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

/***
*
* Hadoop2.2.0测试
* 放WordCount的例子
*
* @author qindongliang
*
* hadoop技术交流群: 376932160
*
*
* */
public class MyWordCount

/**
* Mapper
*
* **/
private static class WMapper extends Mapper<LongWritable, Text, Text, IntWritable>

private IntWritable count=new IntWritable(1);
private Text text=new Text();
@Override
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException
String values[]=value.toString().split("#");
//System.out.println(values[0]+"========"+values[1]);
count.set(Integer.parseInt(values[1]));
text.set(values[0]);
context.write(text,count);





/**
* Reducer
*
* **/
private static class WReducer extends Reducer<Text, IntWritable, Text, Text>

private Text t=new Text();
@Override
protected void reduce(Text key, Iterable<IntWritable> value,Context context)
throws IOException, InterruptedException
int count=0;
for(IntWritable i:value)
count+=i.get();

t.set(count+"");
context.write(key,t);





/**
* 改动一
* (1)shell源码里添加checkHadoopHome的路径
* (2)974行,FileUtils里面
* **/

public static void main(String[] args) throws Exception

// String path1=System.getenv("HADOOP_HOME");
// System.out.println(path1);
// System.exit(0);

JobConf conf=new JobConf(MyWordCount.class);
//Configuration conf=new Configuration();
//conf.set("mapred.job.tracker","192.168.75.130:9001");
//读取person中的数据字段
// conf.setJar("tt.jar");
//注意这行代码放在最前面,进行初始化,否则会报

/**Job任务**/
Job job=new Job(conf, "testwordcount");
job.setJarByClass(MyWordCount.class);
System.out.println("模式: "+conf.get("mapred.job.tracker"));;
// job.setCombinerClass(PCombine.class);

// job.setNumReduceTasks(3);//设置为3
job.setMapperClass(WMapper.class);
job.setReducerClass(WReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);

String path="hdfs://192.168.46.28:9000/qin/output";
FileSystem fs=FileSystem.get(conf);
Path p=new Path(path);
if(fs.exists(p))
fs.delete(p, true);
System.out.println("输出路径存在,已删除!");

FileInputFormat.setInputPaths(job, "hdfs://192.168.46.28:9000/qin/input");
FileOutputFormat.setOutputPath(job,p );
System.exit(job.waitForCompletion(true) ? 0 : 1);





控制台,打印日志如下:
INFO - Configuration.warnOnceIfDeprecated(840) | mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
模式: local
输出路径存在,已删除!
INFO - Configuration.warnOnceIfDeprecated(840) | session.id is deprecated. Instead, use dfs.metrics.session-id
INFO - JvmMetrics.init(76) | Initializing JVM Metrics with processName=JobTracker, sessionId=
WARN - JobSubmitter.copyAndConfigureFiles(149) | Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
WARN - JobSubmitter.copyAndConfigureFiles(258) | No job jar file set. User classes may not be found. See Job or Job#setJar(String).
INFO - FileInputFormat.listStatus(287) | Total input paths to process : 1
INFO - JobSubmitter.submitJobInternal(394) | number of splits:1
INFO - Configuration.warnOnceIfDeprecated(840) | user.name is deprecated. Instead, use mapreduce.job.user.name
INFO - Configuration.warnOnceIfDeprecated(840) | mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
INFO - Configuration.warnOnceIfDeprecated(840) | mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
INFO - Configuration.warnOnceIfDeprecated(840) | mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
INFO - Configuration.warnOnceIfDeprecated(840) | mapred.job.name is deprecated. Instead, use mapreduce.job.name
INFO - Configuration.warnOnceIfDeprecated(840) | mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
INFO - Configuration.warnOnceIfDeprecated(840) | mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
INFO - Configuration.warnOnceIfDeprecated(840) | mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
INFO - Configuration.warnOnceIfDeprecated(840) | mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
INFO - Configuration.warnOnceIfDeprecated(840) | mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
File System Counters
FILE: Number of bytes read=372
FILE: Number of bytes written=382174
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=76
HDFS: Number of bytes written=27
HDFS: Number of read operations=17
HDFS: Number of large read operations=0
HDFS: Number of write operations=6
Map-Reduce Framework
Map input records=4
Map output records=4
Map output bytes=44
Map output materialized bytes=58
Input split bytes=109
Combine input records=0
Combine output records=0
Reduce input groups=3
Reduce shuffle bytes=0
Reduce input records=4
Reduce output records=3
Spilled Records=8
Shuffled Maps =0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=532938752
File Input Format Counters
Bytes Read=38
File Output Format Counters
Bytes Written=27

输入的测试数据如下:

中国#1
美国#2
英国#3
中国#2

输出的结果如下:

中国 3
美国 2
英国 3

至此,已经成功的在eclipse里远程调试hadoop成功
参考技术A private static String checkHadoopHome()

// first check the Dflag hadoop.home.dir with JVM scope
//System.setProperty("hadoop.home.dir", "...");
String home = System.getProperty("hadoop.home.dir");

// fall back to the system/user-global env variable
if (home == null)
home = System.getenv("HADOOP_HOME");


try
// couldn't find either setting for hadoop's home directory
if (home == null)
throw new IOException("HADOOP_HOME or hadoop.home.dir are not set.");


if (home.startsWith("\"") && home.endsWith("\""))
home = home.substring(1, home.length()-1);


// check that the home setting is actually a directory that exists
File homedir = new File(home);
if (!homedir.isAbsolute() || !homedir.exists() || !homedir.isDirectory())
throw new IOException("Hadoop home directory " + homedir
+ " does not exist, is not a directory, or is not an absolute path.");


home = homedir.getCanonicalPath();

catch (IOException ioe)
if (LOG.isDebugEnabled())
LOG.debug("Failed to detect a valid hadoop home directory", ioe);

home = null;

//固定本机的hadoop地址
home="D:\\hadoop-2.2.0";
return home;

。。。本回答被提问者和网友采纳

win系统下的eclipse连接和使用linux上的hadoop集群

准备工作

先在win系统的hosts文件中加入下面内容

10.61.6.164master     //hadoop集群的master节点



一、首先在eclipse上安装hadoop插件

下载hadoop-eclipse-plugin-1.1.2.jar。将其复制到eclipse 安装文件夹下的plugins,再启动eclispe。这时在eclipse的File/New/other下会看到一下的内容,证明插件成功安装

技术分享

二、在Window/show view/other里打开Map/Reduce Locations

技术分享

打开后会有下图所看到的的窗体。点击右側的紫色小象

技术分享

会弹出下图所看到的的窗体

技术分享

然后配置你的连接信息:

Location name:自己能够定义为不论什么名字

Host:是你的hadoop集群的master节点的ip地址

Port:必需按上图配置,当然假设你配置hadoop集群时将默认port换了,此处则为你自己改动的port号


配置完以后会在左側文件夹栏里看到

技术分享

三、配置程序执行參数(你的项目必需是mapreduce项目,而且已经加入hadoop里的全部jar包)

先在你的项目下建立一个in文件夹,并将data数据文件复制到当中,再将你的项目导出问jar文件。然后在你的项目的main函数里加入下面代码

conf.set("mapred.jar", "E://FreqItemSet.jar");//mapred.jar不能更改


右击你的项目。选择Run as/Run Configurations  

技术分享

点击Arguments

在里面加入上图中的内容

Lee 文件在HDFS上的存储路径<dfs_path>

in/data 输入文件(本地路径)<input>

项集的大小k

1 支持度阈值<spt_dg>

out 输出文件<本地路径><output>

点击ok你的项目就可以连接和使用你的hadoop集群了















以上是关于如何在win7下的eclipse中调试Hadoop2.2.0的程序的主要内容,如果未能解决你的问题,请参考以下文章

如何在win7下的eclipse中调试Hadoop2.2.0的程序

win7 eclipse 调试hadoop 坑

如何从 Eclipse 调试 hadoop mapreduce 作业?

调试 Hadoop 源代码

win7下在eclipse3.7中使用hadoop1.2.1插件运行MadReduce例子

IDEA远程调试hadoop程序