mapreduce 怎么查看每个reducer处理的数据量

Posted 2023-04-19

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了mapreduce 怎么查看每个reducer处理的数据量相关的知识，希望对你有一定的参考价值。

参考技术A 您好，第一种方法是用Mapper读取文本文件用StringTokenizer对读取文件内的每一行的数字（Hadoop处理文本文件时，处理时是一行一行记取的）进行分隔，获取每一个数字，然后求和,再将求得的值按Key/Value格式写入Context，最后用Reducer对求得中间值进行汇总求和，得出整个文件所有数字的和。
第二种方法是用Mapper读取文本文件用StringTokenizer对文件内的数字进行分隔，获取每一个数字，并救出文件中该数字有多少个，在合并过程中，求出每个数字在文件中的和，最后用Reducer对求得每个数字求得的和进行汇总求和，得出整个文件所有数字的和。
package com.metarnet.hadoop;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class NumberSum

//对每一行数据进行分隔，并求和
public static class SumMapper extends
Mapper<Object, Text, Text, LongWritable>
private Text word = new Text("sum");
private static LongWritable numValue = new LongWritable(1);
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException
StringTokenizer itr = new StringTokenizer(value.toString());
long sum = 0;
while (itr.hasMoreTokens())
String s = itr.nextToken();
long val = Long.parseLong(s);
sum += val;

numValue.set(sum);
context.write(word, numValue);

// 汇总求和，输出
public static class SumReducer extends
Reducer<Text, LongWritable, Text, LongWritable>
private LongWritable result = new LongWritable();
private Text k = new Text("sum");
public void reduce(Text key, Iterable<LongWritable> values,
Context context) throws IOException, InterruptedException
long sum = 0;
for (LongWritable val : values)
long v = val.get();
sum += v;

result.set(sum);
context.write(k, result);

/**
* @param args
* @throws Exception
*/
public static void main(String[] args) throws Exception
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 2)
System.err.println("Usage: numbersum <in> <out>");
System.exit(2);

Job job = new Job(conf, "number sum");
job.setJarByClass(NumberSum.class);
job.setMapperClass(SumMapper.class);
job.setReducerClass(SumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
System.out.println("ok");

第一种实现的方法相对简单，第二种实现方法用到了Combiner类，Hadoop 在对中间求的的数值进行Combiner时，是通过递归的方式不停地对 Combiner方法进行调用，进而合并数据的。
从两种方法中，我们可以看出Map/Reduce的核心是在怎样对输入的数据按照何种方式是进行Key/Value分对的，分的合理对整个结果输出及算法实现有很大的影响。追问

我是需要统计结果这个应该有自带的计数器的吧，我不知道该看哪个

hbase怎么查询表里的总纪录数

可以用协处理器，再不济可以自己写个mapreduce，hbase api提供了一些类供mapreduce查询hbase、插入数据到hbase。思想就是用mapreduce 读取表的数据，在reduce中做一个统计参考技术A 可以用spark先查询表中的rowkey，然后求rowkey的count，比MapReduce快很多

以上是关于mapreduce 怎么查看每个reducer处理的数据量的主要内容，如果未能解决你的问题，请参考以下文章

如何查看hadoop mapreduce 性能

mapreduce

MapReduce架构简介

Hive与MapReduce相关排序及自定义UDF函数

7.3 MapReduce工作流程

hadoop mapreduce 进程都有哪些