bigdata hadoop java codefor wordcount 修改
Posted
技术标签:
【中文标题】bigdata hadoop java codefor wordcount 修改【英文标题】:bigdata hadoop java codefor wordcount modified 【发布时间】:2014-10-02 22:46:44 【问题描述】:我必须修改 hadoop wordcount 示例,以计算以前缀“cons”开头的单词的数量,然后需要按频率的降序对结果进行排序。谁能告诉如何为此编写映射器和缩减器代码?
代码:
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
//Replacing all digits and punctuation with an empty string
String line = value.toString().replaceAll("\\pPunct|\\d", "").toLowerCase();
//Extracting the words
StringTokenizer record = new StringTokenizer(line);
//Emitting each word as a key and one as itsvalue
while (record.hasMoreTokens())
context.write(new Text(record.nextToken()), new IntWritable(1));
【问题讨论】:
public class WordCountMapper extends Mapper要计算以“cons”开头的单词的数量,您可以在从 mapper 发射时丢弃所有其他单词。
public void map(Object key, Text value, Context context) throws IOException,
InterruptedException
IntWritable one = new IntWritable(1);
String[] words = value.toString().split(" ");
for (String word : words)
if (word.startsWith("cons"))
context.write(new Text("cons_count"), one);
reducer 现在将只收到一个 key = cons_count,您可以将这些值相加得到计数。
要根据频率对以“cons”开头的单词进行排序,以cons开头的单词应该去同一个reducer,reducer应该对其进行总结和排序。为此,
public class MyMapper extends Mapper<Object, Text, Text, Text>
@Override
public void map(Object key, Text value, Context output) throws IOException,
InterruptedException
String[] words = value.toString().split(" ");
for (String word : words)
if (word.startsWith("cons"))
context.write(new Text("cons"), new Text(word));
减速机:
public class MyReducer extends Reducer<Text, Text, Text, IntWritable>
@Override
public void reduce(Text key, Iterable<Text> values, Context output)
throws IOException, InterruptedException
Map<String,Integer> wordCountMap = new HashMap<String,Integer>();
for(Text value: values)
word = value.get();
if (wordCountMap.contains(word)
Integer count = wordCountMap.get(key);
count++;
wordCountMap.put(word,count)
else
wordCountMap.put(word,new Integer(1));
//use some sorting mechanism to sort the map based on values.
// ...
for (Map.Entry<String, Integer> entry : wordCountMap.entrySet())
context.write(new Word(entry.getKey(),new IntWritable(entry.getValue());
【讨论】:
第二个映射器代码正是我们需要的。删除除以“cons”开头的所有其他词。 hadoop 按它们的键对中间键值对进行排序,输出按升序排序。这里我们必须编写自定义排序比较器,用于以 cons 开头的单词的降序。 @blackbookstar 的整个代码是指排序吗?检查此链接以了解如何执行此操作:***.com/questions/109383/…以上是关于bigdata hadoop java codefor wordcount 修改的主要内容,如果未能解决你的问题,请参考以下文章