案例3,mapreduce单表关联,根据child-parient表解析出grandchild-grandparient表
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了案例3,mapreduce单表关联,根据child-parient表解析出grandchild-grandparient表相关的知识,希望对你有一定的参考价值。
1.数据样例如下
Tom Lucy
Tom Jack
Jone Lucy
Jone Jack
Lucy Mary
Lucy Ben
Jack Alice
Jack Jesse
Terry Alice
Terry Jesse
Philip Terry
Philip Alma
Mark Terry
Mark Alma
2.map的代码如下:
public static class ChildParentMapper extends MapReduceBase implements Mapper<Object, Text, Text, Text> {
private static Logger logger = Logger.getLogger(ChildParentMapper.class);
String childname = new String();
String parientname = new String();
String flag = new String();//左右表标识符
@Override
public void map(Object ikey, Text ivalue, OutputCollector<Text, Text> output, Reporter arg3)
throws IOException {
String str[] = ivalue.toString().split(" ");//分割出子和父的名称
if (str[0].compareTo("child") != 0) {//忽略表头
childname = str[0];//得到子名称
parientname = str[1];//得到父名称
// 左表=左表标识+子名称+父名称
flag = "1";
logger.info(new Text(parientname)+","+ new Text(flag + "+" + childname + "+" + parientname));
output.collect(new Text(parientname), new Text(flag + "+" + childname + "+" + parientname));
// 右表=右表标识+子名称+父名称
flag = "2";
logger.info(new Text(parientname)+","+ new Text(flag + "+" + childname + "+" + parientname));
output.collect(new Text(childname), new Text(flag + "+" + childname + "+" + parientname));
}
}
}
代码解析:
第一步,定义以下三个参数:
1.子女名称(childname ):
2.父母名称(parientname ):
3.区分左表和右表的一个标识符号(flag ):
String childname = new String();
String parientname = new String();
String flag = new String();//左右表标识符
第二步,切割数据,分别得到子女名称和父母名称
String str[] = ivalue.toString().split(" ");
childname = str[0];//得到子名称
parientname = str[1];//得到父名称
第三步,做两个key,value的输出,分别标识出左表和右表
第一个:<父母名称,左表表标识符+子名称+父名称>
flag = "1";
output.collect(new Text(parientname), new Text(flag + "+" + childname + "+" + parientname));
第二个:<子女名称,右表表标识符+子名称+父名称>
flag = "2";
output.collect(new Text(childname), new Text(flag + "+" + childname + "+" + parientname));
第四步,mapper结果:
Alice 1+Terry+Alice
Alice 1+Jack+Alice
Alma 1+Mark+Alma
Alma 1+Philip+Alma
Ben 1+Lucy+Ben
Jack 2+Jack+Alice
Jack 1+Tom+Jack
Jack 1+Jone+Jack
Jack 2+Jack+Jesse
Jesse 1+Jack+Jesse
Jesse 1+Terry+Jesse
Jone 2+Jone+Lucy
Jone 2+Jone+Jack
Lucy 1+Tom+Lucy
Lucy 2+Lucy+Ben
Lucy 2+Lucy+Mary
Lucy 1+Jone+Lucy
Mark 2+Mark+Alma
Mark 2+Mark+Terry
Mary 1+Lucy+Mary
Philip 2+Philip+Terry
Philip 2+Philip+Alma
Terry 1+Philip+Terry
Terry 1+Mark+Terry
Terry 2+Terry+Alice
Terry 2+Terry+Jesse
Tom 2+Tom+Lucy
Tom 2+Tom+Jack
4.reduce代码如下:
public static class ChildParentReduce extends MapReduceBase implements Reducer<Text, Text, Text, Text> {
private static Logger logger = Logger.getLogger(ChildParentReduce.class);
private int num = 0;
@Override
public void reduce(Text ikey, Iterator<Text> ivalue, OutputCollector<Text, Text> output, Reporter arg3)
throws IOException {
if (num == 0) {// 构造输出表头
output.collect(new Text("grandchild"), new Text("grandparient"));
num++;
}
int grandchildnum = 0;//多少个孙
int grandparientnum = 0;//多少个爷
String[] grandchild = new String[100];
String[] grandparient = new String[100];
while (ivalue.hasNext()){
String[] record = ivalue.next().toString().split("\\+");//根据“+”把数据分成三份
//左表数据
if (record[0].compareTo("1") == 0) {
grandchild[grandchildnum] = record[1];//拿到子名,放到数组中
grandchildnum++;
}
//右表数据
else if (record[0].compareTo("2") == 0) {
grandparient[grandparientnum] = record[2];//拿到父名,放到数组中
grandparientnum++;
}
}
if (grandchildnum != 0 && grandparientnum != 0) {
//执行笛卡尔乘积
for (int i = 0; i < grandparientnum; i++) {
for (int j = 0; j < grandchildnum; j++) {
logger.info(new Text(grandchild[i])+","+new Text(grandparient[j]));
output.collect(new Text(grandchild[i]), new Text(grandparient[j]));
}
}
}
}
代码解析:
第一步:如果需要表头就在第一行输出表头
if (num == 0) {// 构造输出表头
output.collect(new Text("grandchild"), new Text("grandparient"));
num++;
}
第二步:定义四个参数,分别用于存放孙子和祖辈的数组,孙子的数量和祖辈的数量
int grandchildnum = 0;//多少个孙
int grandparientnum = 0;//多少个爷
String[] grandchild = new String[100];
String[] grandparient = new String[100];
第三步:解析map中得到的value-list
第一:要解析的内容应该是这样的:以mapper的结果Lucy作为key,解析如下数据:
<Lucy, 1+Tom+Lucy,2+Lucy+Ben,2+Lucy+Mary,1+Jone+Lucy>
循环value:
//左表数据
if (record[0].compareTo("1") == 0) {
grandchild[grandchildnum] = record[1];//拿到子名,放到数组中
grandchildnum++;
}
孙子:Tom,Jone
//右表数据
else if (record[0].compareTo("2") == 0) {
grandparient[grandparientnum] = record[2];//拿到父名,放到数组中
grandparientnum++;
}
祖辈;Ben,Mary
使用笛卡尔乘积,得到祖辈与孙辈的关系结果:
if (grandchildnum != 0 && grandparientnum != 0) {
//执行笛卡尔乘积
for (int i = 0; i < grandparientnum; i++) {
for (int j = 0; j < grandchildnum; j++) {
logger.info(new Text(grandchild[i])+","+new Text(grandparient[j]));
output.collect(new Text(grandchild[i]), new Text(grandparient[j]));
}
}
}
Tom,Ben
Tom,Mary
Jone ,Ben
Jone ,Mary
附上main方法:
public static void main(String[] args) {
try {
String inputDir = "hdfs://192.168.1.61:9000/home/zhongml/childparent/input";
String outputDir = "hdfs://192.168.1.61:9000/home/zhongml/childparent/output";
JobConf con = new JobConf(ChildParent2.class);
con.setJobName("childparent");
con.setMapOutputKeyClass(Text.class);
con.setMapOutputValueClass(Text.class);
con.setOutputKeyClass(Text.class);
con.setOutputValueClass(Text.class);
con.setMapperClass(ChildParentMapper.class);
con.setReducerClass(ChildParentReduce.class);
con.setInputFormat(TextInputFormat.class);
con.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(con, new Path(inputDir));
FileOutputFormat.setOutputPath(con, new Path(outputDir));
JobClient.runJob(con);
System.exit(0);
} catch (IllegalArgumentException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
本文出自 “钟茂霖博客” 博客,请务必保留此出处http://zhongml.blog.51cto.com/4808277/1877330
以上是关于案例3,mapreduce单表关联,根据child-parient表解析出grandchild-grandparient表的主要内容,如果未能解决你的问题,请参考以下文章