案例3,mapreduce单表关联,根据child-parient表解析出grandchild-grandparient表

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了案例3,mapreduce单表关联,根据child-parient表解析出grandchild-grandparient表相关的知识,希望对你有一定的参考价值。

1.数据样例如下

Tom Lucy 

Tom Jack 

Jone Lucy 

Jone Jack 

Lucy Mary 

Lucy Ben 

Jack Alice 

Jack Jesse 

Terry Alice 

Terry Jesse 

Philip Terry 

Philip Alma 

Mark Terry 

Mark Alma

2.map的代码如下:

            public static class ChildParentMapper extends MapReduceBase implements Mapper<Object, Text, Text, Text> {

                        private static Logger logger = Logger.getLogger(ChildParentMapper.class);

                        String childname = new String();

                        String parientname = new String();

                        String flag = new String();//左右表标识符

                        @Override

                        public void map(Object ikey, Text ivalue, OutputCollector<Text, Text> output, Reporter arg3)

                                                throws IOException {

                                    String str[] = ivalue.toString().split(" ");//分割出子和父的名称

                                    if (str[0].compareTo("child") != 0) {//忽略表头

                                                

                                                childname = str[0];//得到子名称

                                                parientname = str[1];//得到父名称

                                                // 左表=左表标识+子名称+父名称

                                                flag = "1";

                                                logger.info(new Text(parientname)+","+ new Text(flag + "+" + childname + "+" + parientname));

                                                output.collect(new Text(parientname), new Text(flag + "+" + childname + "+" + parientname));

                                                // 右表=右表标识+子名称+父名称

                                                flag = "2";

                                                logger.info(new Text(parientname)+","+ new Text(flag + "+" + childname + "+" + parientname));

                                                output.collect(new Text(childname), new Text(flag + "+" + childname + "+" + parientname));

                                    }

                        }

            }


代码解析:

第一步,定义以下三个参数:

1.子女名称(childname ):

2.父母名称(parientname ):

3.区分左表和右表的一个标识符号(flag ):

  String childname = new String();

  String parientname = new String();

  String flag = new String();//左右表标识符


第二步,切割数据,分别得到子女名称和父母名称


  String str[] = ivalue.toString().split(" ");

  childname = str[0];//得到子名称

  parientname = str[1];//得到父名称


第三步,做两个key,value的输出,分别标识出左表和右表


           第一个:<父母名称,左表表标识符+子名称+父名称>

                                                  flag = "1";

                                                output.collect(new Text(parientname), new Text(flag + "+" + childname + "+" + parientname));

           第二个:<子女名称,右表表标识符+子名称+父名称>

                                                flag = "2";

                                                output.collect(new Text(childname), new Text(flag + "+" + childname + "+" + parientname));

第四步,mapper结果:


Alice  1+Terry+Alice

Alice  1+Jack+Alice

Alma   1+Mark+Alma

Alma   1+Philip+Alma

Ben    1+Lucy+Ben

Jack   2+Jack+Alice

Jack   1+Tom+Jack

Jack   1+Jone+Jack

Jack   2+Jack+Jesse

Jesse  1+Jack+Jesse

Jesse  1+Terry+Jesse

Jone   2+Jone+Lucy

Jone   2+Jone+Jack

Lucy   1+Tom+Lucy

Lucy   2+Lucy+Ben

Lucy   2+Lucy+Mary

Lucy   1+Jone+Lucy

Mark   2+Mark+Alma

Mark   2+Mark+Terry

Mary   1+Lucy+Mary

Philip 2+Philip+Terry

Philip 2+Philip+Alma

Terry  1+Philip+Terry

Terry  1+Mark+Terry

Terry  2+Terry+Alice

Terry  2+Terry+Jesse

Tom    2+Tom+Lucy

Tom    2+Tom+Jack

4.reduce代码如下:

            public static class ChildParentReduce extends MapReduceBase implements Reducer<Text, Text, Text, Text> {

                        private static Logger logger = Logger.getLogger(ChildParentReduce.class);

                        private int num = 0;

                        @Override

                        public void reduce(Text ikey, Iterator<Text> ivalue, OutputCollector<Text, Text> output, Reporter arg3)

                                                throws IOException {

                                    if (num == 0) {// 构造输出表头

                                                output.collect(new Text("grandchild"), new Text("grandparient"));

                                                num++;

                                    }

                                    int grandchildnum = 0;//多少个孙

                                    int grandparientnum = 0;//多少个爷

                                    String[] grandchild = new String[100];

                                    String[] grandparient = new String[100];

                                    while (ivalue.hasNext()){

                                                String[] record = ivalue.next().toString().split("\\+");//根据“+”把数据分成三份

                                                //左表数据

                                                if (record[0].compareTo("1") == 0) {

                                                            grandchild[grandchildnum] = record[1];//拿到子名,放到数组中

                                                            grandchildnum++;

                                                }

                                                //右表数据

                                                else if (record[0].compareTo("2") == 0) {

                                                            grandparient[grandparientnum] = record[2];//拿到父名,放到数组中

                                                            grandparientnum++;

                                                }

                                    }

                                    if (grandchildnum != 0 && grandparientnum != 0) {

                                    //执行笛卡尔乘积

                                                for (int i = 0; i < grandparientnum; i++) {

                                                            for (int j = 0; j < grandchildnum; j++) {

                                                                        logger.info(new Text(grandchild[i])+","+new Text(grandparient[j]));

                                                                        output.collect(new Text(grandchild[i]), new Text(grandparient[j]));

                                                            }

                                                }

                                    }

                        }


代码解析:

第一步:如果需要表头就在第一行输出表头

                                    if (num == 0) {// 构造输出表头

                                                output.collect(new Text("grandchild"), new Text("grandparient"));

                                                num++;

                                    }

第二步:定义四个参数,分别用于存放孙子和祖辈的数组,孙子的数量和祖辈的数量


                                    int grandchildnum = 0;//多少个孙

                                    int grandparientnum = 0;//多少个爷

                                    String[] grandchild = new String[100];

                                    String[] grandparient = new String[100];

第三步:解析map中得到的value-list

           第一:要解析的内容应该是这样的:以mapper的结果Lucy作为key,解析如下数据:

                

 

<Lucy, 1+Tom+Lucy,2+Lucy+Ben,2+Lucy+Mary,1+Jone+Lucy>


循环value

                                              //左表数据

                                                if (record[0].compareTo("1") == 0) {

                                                            grandchild[grandchildnum] = record[1];//拿到子名,放到数组中

                                                            grandchildnum++;

                                                }

孙子:Tom,Jone


                                                 //右表数据

                                                else if (record[0].compareTo("2") == 0) {

                                                            grandparient[grandparientnum] = record[2];//拿到父名,放到数组中

                                                            grandparientnum++;

                                                }

祖辈;Ben,Mary


使用笛卡尔乘积,得到祖辈与孙辈的关系结果:

                                    if (grandchildnum != 0 && grandparientnum != 0) {

                                    //执行笛卡尔乘积

                                                for (int i = 0; i < grandparientnum; i++) {

                                                            for (int j = 0; j < grandchildnum; j++) {

                                                                        logger.info(new Text(grandchild[i])+","+new Text(grandparient[j]));

                                                                        output.collect(new Text(grandchild[i]), new Text(grandparient[j]));

                                                            }

                                                }

                                    }


Tom,Ben

TomMary

Jone Ben

Jone Mary



附上main方法:

public static void main(String[] args) {

                                    try {

                                                String inputDir = "hdfs://192.168.1.61:9000/home/zhongml/childparent/input";

                                                String outputDir = "hdfs://192.168.1.61:9000/home/zhongml/childparent/output";

                                                JobConf con = new JobConf(ChildParent2.class);

                                                con.setJobName("childparent");

                                                con.setMapOutputKeyClass(Text.class);

                                                con.setMapOutputValueClass(Text.class);

                                                con.setOutputKeyClass(Text.class);

                                                con.setOutputValueClass(Text.class);

                                                con.setMapperClass(ChildParentMapper.class);

                                                con.setReducerClass(ChildParentReduce.class);

                                                con.setInputFormat(TextInputFormat.class);

                                                con.setOutputFormat(TextOutputFormat.class);

                                                FileInputFormat.setInputPaths(con, new Path(inputDir));

                                                FileOutputFormat.setOutputPath(con, new Path(outputDir));

                                                JobClient.runJob(con);

                                                System.exit(0);

                                    } catch (IllegalArgumentException e) {

                                                e.printStackTrace();

                                    } catch (IOException e) {

                                                e.printStackTrace();

                                    }

                        }

            }





                    






本文出自 “钟茂霖博客” 博客,请务必保留此出处http://zhongml.blog.51cto.com/4808277/1877330

以上是关于案例3,mapreduce单表关联,根据child-parient表解析出grandchild-grandparient表的主要内容,如果未能解决你的问题,请参考以下文章

大数据学习之十二——MapReduce代码实例:关联性操作

mapreduce-实现单表关联

hadoop之MapReduce的案例(多表关联)

MapReduce编程之实现多表关联

跟A君学大数据-用MapReduce实现表关联

12Hadoop框架MapReduce 统计人数总分关联