修改hadoop自带的wordcount代码，实现输出指定单词及其数量

Posted 2023-02-28

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了修改hadoop自带的wordcount代码，实现输出指定单词及其数量相关的知识，希望对你有一定的参考价值。

RT，就是在wordcount基础上改动，比如我想要“hello”，就只输出“hello 2” 老师给的提示说 in.txt 参数不变，把out.txt参数改成想查得单词，还说用“System.out.println()” 就可以，一点门路没摸到求大神
这是原代码
public class WordCount
字数超了，去掉了MAP阶段代码
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable>
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context
) throws IOException, InterruptedException
int sum = 0;
for(IntWritable val : values)
sum += val.get();

result.set(sum);
context.write(key, result);

public static void main(String[] args) throws Exception
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2)
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);

Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);

可以加分～～

首先你需要指定你想要的单词是什么，可以通过命令行指定，然后保存在Configuration中。然后map任务和reduce任务只需要改一处就可以了，就是在获取到key(单词作为key)时从Configuration中获取指定的单词，如果相同则使用context.write输出，不相同直接不输出即可，这样输出的就是指定的单词的数据追问

能说一下具体改法吗，代码太菜了，单词就“hello”，我在reduce阶段用if（key.equals("hello")）判断没有结果输出，感觉就是没有执行

追答

reduce阶段使用if（key.equals("hello")）应该是没有问题的，你确定key包含有hello是吧，注意不要有空格。你的结果输出是指什么，你可以把所有的key打印出来，使用System.out.println()打印，然后到job运行的日志里面去找打印结果

参考技术A #include <stdio.h>
int CountWord(char *s,int n)

int c=0,cw=0;
for(;*s;s++)
if('a'<=*s && *s<='z' || 'A'<=*s && *s<='Z')
c++;
else

if(c>n)
cw++;
c=0;

if(c>n)
cw++;
return cw;

void main()

char s[100];
int n;
printf("输入字符串:");
gets(s);
printf("输入n:");
scanf("%d",&n);
printf("字母数大于%d的单词有%d个\n",n,CountWord(s,n));

以上是关于修改hadoop自带的wordcount代码，实现输出指定单词及其数量的主要内容，如果未能解决你的问题，请参考以下文章