Java大数据开发Hadoop(22)-NLineInputFormat案例

Posted 2021-04-13 跟我一起学大数据

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Java大数据开发Hadoop(22)-NLineInputFormat案例相关的知识，希望对你有一定的参考价值。

导读：上一节我们讲解了FileInputFormat实现类有很多，本节讲解实现类NLineInputFormat的案例操作。

NLineInputFormat使用案例

1．需求

对每个单词进行个数统计，要求根据每个输入文件的行数来规定输出多少个切片。此案例要求每三行放入一个切片中。

(1) 输入数据

hadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldxiaoming hive helloworld

(2) 期望输出数据

Number of splits:4

2．需求分析

3．代码编写

(1) 编写Mapper类

public class NLineMapper extends Mapper<LongWritable, Text, Text, LongWritable>{
 private Text k = new Text(); private LongWritable v = new LongWritable(1);
 @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
 // 1 获取一行 String line = value.toString();
 // 2 切割 String[] splited = line.split(" ");
 // 3 循环写出 for (int i = 0; i < splited.length; i++) {
 k.set(splited[i]);
 context.write(k, v); } }}

(2) 编写Reducer类

public class NLineReducer extends Reducer<Text, LongWritable, Text, LongWritable>{
 LongWritable v = new LongWritable();
 @Override protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {    long sum = 0l;
 // 1 汇总 for (LongWritable value : values) { sum += value.get();        }          v.set(sum); // 2 输出 context.write(key, v); }}

(3) 编写Driver类

public class NLineDriver {
 public static void main(String[] args) throws IOException, URISyntaxException, ClassNotFoundException, InterruptedException {
// 输入输出路径需要根据自己电脑上实际的输入输出路径设置args = new String[] { "d:/input/inputword", "d:/output1" };
 // 1 获取job对象 Configuration configuration = new Configuration(); Job job = Job.getInstance(configuration);
 // 7设置每个切片InputSplit中划分三条记录 NLineInputFormat.setNumLinesPerSplit(job, 3);
 // 8使用NLineInputFormat处理记录数  job.setInputFormatClass(NLineInputFormat.class);
 // 2设置jar包位置，关联mapper和reducer job.setJarByClass(NLineDriver.class);  job.setMapperClass(NLineMapper.class);  job.setReducerClass(NLineReducer.class);
 // 3设置map输出kv类型 job.setMapOutputKeyClass(Text.class);  job.setMapOutputValueClass(LongWritable.class);
 // 4设置最终输出kv类型 job.setOutputKeyClass(Text.class);  job.setOutputValueClass(LongWritable.class);
 // 5设置输入输出数据路径 FileInputFormat.setInputPaths(job, new Path(args[0]));  FileOutputFormat.setOutputPath(job, new Path(args[1]));
 // 6提交job job.waitForCompletion(true);  }}

4．测试

(1) 输入数据

hadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldxiaoming hive helloworld

(2) 输出结果的切片数

Number of splits:4

关注「跟我一起学大数据」

跟我一起学大数据

以上是关于Java大数据开发Hadoop(22)-NLineInputFormat案例的主要内容，如果未能解决你的问题，请参考以下文章