hadoop-mapreduce--统计单词数量
Posted dyh004
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了hadoop-mapreduce--统计单词数量相关的知识,希望对你有一定的参考价值。
编写map程序
package com.cvicse.ump.hadoop.mapreduce.map; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WordCountMap extends Mapper<LongWritable, Text, Text, IntWritable> { @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] words = line.split(" "); for(String word:words){ context.write(new Text(word), new IntWritable(1)); } } }
编写reduce程序
package com.cvicse.ump.hadoop.mapreduce.reduce; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable> { @Override protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { Integer count = 0; for(IntWritable value:values){ count+=value.get(); } context.write(key, new IntWritable(count)); } }
编写main函数
package com.cvicse.ump.hadoop.mapreduce; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import com.cvicse.ump.hadoop.mapreduce.map.WordCountMap; import com.cvicse.ump.hadoop.mapreduce.reduce.WordCountReduce; public class WordCount { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf,"wordCount"); job.setJarByClass(WordCount.class); job.setMapperClass(WordCountMap.class); job.setReducerClass(WordCountReduce.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); boolean bb = job.waitForCompletion(true); if(!bb){ System.out.println("wrodcount task fail!"); }else{ System.out.println("wordcount task success!"); } } }
把wordcount.txt放在hdfs的/dyh/data/input/目录下
执行:hadoop jar hdfs.jar com.cvicse.ump.hadoop.mapreduce.WordCount /dyh/data/input/wordcount.txt /dyh/data/output/1
以上是关于hadoop-mapreduce--统计单词数量的主要内容,如果未能解决你的问题,请参考以下文章
java面试题:如果一串字符如"aaaabbc中国1512"要分别统计英文字符的数量,中文字符的数量,和数字字符的数量,假设字符中没有中文字符英文字符数字字符之外的其他特殊字符(代