向MapReduce转换:生成用户向量

Posted 杨鑫newlfe

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了向MapReduce转换:生成用户向量相关的知识,希望对你有一定的参考价值。

分两部分:

<span style="font-size:18px;">/***
 * @author YangXin
 * @date 2016/2/21
 * @ info 主要功能是mahout实现解析Wikipedia链接文件的Mapper接口
 */
package unitSix;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.mahout.math.VarLongWritable;

public class WikipediaToItemPrefsMapper extends Mapper<LongWritable, Text, VarLongWritable, VarLongWritable>{
	private static final Pattern NUMBERS = Pattern.compile("(\\d+)");
	public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{
		String line = value.toString();
		Matcher m = NUMBERS.matcher(line);
		//定位用户ID
		m.find();                              
		VarLongWritable userID = new VarLongWritable(Long.parseLong(m.group()));
		VarLongWritable itemID = new VarLongWritable();
		while(m.find()){
			itemID.set(Long.parseLong(m.group()));
			//为每个物品ID生成用户-物品对
			context.write(userID, itemID);
		}
	}
}</span>



<strong><span style="font-size:18px;">/***
 * @author YangXin
 * @info 功能是mahout实现从用户物品偏好中生成Vector的reducer接口
 */
package unitSix;
import java.io.IOException;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.mahout.math.RandomAccessSparseVector;
import org.apache.mahout.math.VarLongWritable;
import org.apache.mahout.math.Vector;
import org.apache.mahout.math.VectorWritable;

public class WikipediaToUserVectorReducer extends Reducer<VarLongWritable, VarLongWritable, VarLongWritable, VectorWritable>{
	public void reduce(VarLongWritable userID, Iterable<VarLongWritable> itemPrefs, Context context) throws IOException, InterruptedException{
		Vector userVector = new RandomAccessSparseVector(Integer.MAX_VALUE, 100);
		for(VarLongWritable itemPref : itemPrefs){
			userVector.set((int)itemPref.get(), 1.0f);
		}
		context.write(userID, new VectorWritable(userVector));
	}
}
</span></strong>

以上是关于向MapReduce转换:生成用户向量的主要内容,如果未能解决你的问题,请参考以下文章

向MapReduce转换:通过部分成绩计算矩阵乘法

mapreduce shuffle 和sort 详解

MapReduce :基于 FileInputFormat 的 mapper 数量控制

如何使用向量化代码从 MATLAB 中的两个向量生成所有对?

shuffle和sort分析

利用艺术家的整数ID映射将标签转换为向量