2017 级课堂测试试卷—数据清洗进度记录
Posted jinseliunian
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了2017 级课堂测试试卷—数据清洗进度记录相关的知识,希望对你有一定的参考价值。
数据清洗:按照进行数据清洗,并将清洗后的数据导入hive数据库中
利用mapreduce完成将txt文件中数据存放在一个数组中,未成功连接hive数据库并存放在hive数据库中
目前完成代码:
package org.apache.hadoop.examples; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount1{ public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { Job job = Job.getInstance(); job.setJobName("WordCount1"); job.setJarByClass(WordCount1.class); job.setMapperClass(doMapper.class); job.setReducerClass(doReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(NullWritable.class); Path in = new Path("hdfs://localhost:9000/user/hadoop/input/resault"); Path out = new Path("hdfs://localhost:9000/user/hadoop/output2"); FileInputFormat.addInputPath(job, in); FileOutputFormat.setOutputPath(job, out); System.exit(job.waitForCompletion(true) ? 0 : 1); } public static class doMapper extends Mapper<Object, Text, Text, NullWritable>{ public static Text word = new Text(); @Override protected void map(Object key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String arr[] = line.split(","); word.set(arr[0]+" "+arr[1]+" "+arr[3]+" "+arr[4]+"/"+arr[5]); context.write(word, NullWritable.get()); } } public static class doReducer extends Reducer<Text, NullWritable, Text, NullWritable>{ @Override protected void reduce(Text key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException { context.write(key, NullWritable.get()); } } }
以上是关于2017 级课堂测试试卷—数据清洗进度记录的主要内容,如果未能解决你的问题,请参考以下文章