Hadoop mapreduce 中的奇怪输出

Posted 2023-03-16

技术标签:

【中文标题】Hadoop mapreduce 中的奇怪输出【英文标题】：Strange output in Hadoop mapreduce 【发布时间】：2012-09-28 15:35:49 【问题描述】：

这是来自输入文件的示例：

1,name1,name2 
2,name3,name4 
3,name5,name6

这是我的地图方法：

public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException 

    String line = value.toString();
    StringTokenizer tk = new StringTokenizer( line, ",");       
    String keyValue = tk.nextToken();
    String s1Value = tk.nextToken();
    String s2Value = tk.nextToken();
    String valueString = s1Value+","+s2Value;
    output.collect( new Text(keyValue), new Text(valueString) );

这是我的reduce函数：

public static class Reduce extends MapReduceBase implements Reducer<Text, Text, Text, Text> 

    public void reduce(Text key, Iterator<Text> values, 
        OutputCollector<Text, Text> output, Reporter reporter) throws IOException 
    
        String item="";
        Text tmp= new Text();
        while ( values.hasNext() ) 
        
            tmp = values.next();
        
        item = tmp.toString();

        StringTokenizer tk = new StringTokenizer( item, ",");

        String s1="";
        String s2="";
        boolean entered = false;
        try
        
            while ( tk.hasMoreTokens() && !entered )
               
                s1 = tk.nextToken();
                s2 = tk.nextToken();
                entered = true;
            
        
        catch (Exception e )
        
            System.out.println("PROBLEM:"+item);
        
        double result = compare(s1,s2);
        String result2 = s1+" & "+s2+"="+result;
        output.collect( key, new Text(result2) );

所以我希望输出是（例如）：

name1 & name2=1.0

但我得到的是：

name1 & name2=1.0  &  =0.0

看起来总是有两个空字符串被比较！为什么总是有空字符串？

【问题讨论】：

计数器转储对映射器输出记录的数量和减速器方法的数量有什么看法？它说：两者都是 13，因为我在输入文件中有 13 行，我希望每一行都单独处理.. 【参考方案1】：

它应该关心“compare(s1,s2)”的代码；请粘贴比较函数的代码。

【讨论】：

“比较”代码类似于比较两个字符串的任何代码，它返回编辑距离（例如）或两个字符串在 0-1 之间的比例相似程度。

以上是关于Hadoop mapreduce 中的奇怪输出的主要内容，如果未能解决你的问题，请参考以下文章