如何从 Hadoop 中的地图程序中输出具有列表等数据结构的自定义类

Posted 2023-04-18

技术标签:

【中文标题】如何从 Hadoop 中的地图程序中输出具有列表等数据结构的自定义类【英文标题】：How do i output Custom Classes having lists etc data structure from a Map Program in Hadoop 【发布时间】：2015-05-12 18:24:53 【问题描述】：

我是 Hadoop 和 Map Reduce 编程的新手。我有一个数据集，其中包含来自 943 个用户的电影评分。每个用户最多可对 20 部电影进行评分。现在我希望我的 Mapper 的输出是用户 ID 和一个自定义类，该类将有两个电影列表（用户评分的电影 id）和评分（每部电影的评分）。但我不确定在这种情况下如何从 Map 方法输出这些值。代码 sn-ps 如下：-

public class UserRatings implements WritableComparable
private List<String> movieId;
private List<String> movieRatings;
public List<String> getMovieRatings() 
    return movieRatings;


public void setMovieRatings(List<String> movieRatings) 
    this.movieRatings = movieRatings;


public List<String> getMovieId() 
    return movieId;


public void setMovieId(List<String> movieId) 
    this.movieId = movieId;


@Override
public int compareTo(Object o) 
    return 0;


@Override
public void write(DataOutput dataOutput) throws IOException 
    dataOutput.write


@Override
public void readFields(DataInput dataInput) throws IOException

这里是地图方法

public class GenreMapper extends Mapper<LongWritable,Text,Text,IntWritable> 

public void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException
   // Logic for parsing the file and exracting the data. Can be ignored...
    String[] input = value.toString().split("\t");
    Map<String,UserRatings> mapData = new HashMap<String,UserRatings>();
    for(int i=0;i<input.length;i++)
        List<String> tempList = new ArrayList<String>();
        UserRatings userRatings = new UserRatings();
        tempList.add(input[3]);
        List<String> tempMovieId = new ArrayList<String>();
        tempMovieId.add(input[1]);
        for(int j=4;j<input.length;j++)
            if(input[i].contentEquals(input[j]))
                   tempMovieId.add(input[j+1]);
                   tempList.add(input[j+3]);
                    j = j+4;
            

        
        userRatings.setMovieId(tempMovieId);
        userRatings.setMovieRatings(tempList);
        mapData.put(input[i],userRatings);
    
   // context.write();

【问题讨论】：

【参考方案1】：

我认为您错过了映射器功能的重点。映射器不应在其输出上发出列表。 mapper 的关键点是生成一个reducer 将捕获的元组，并针对该键进行必要的计算以产生良好的输出，鉴于此，mapper 的输出格式应尽可能简单。

在这种情况下，我认为正确的方法是在映射器上发出一个键值对：

user_id、custom_class

自定义类必须有一个movie_id 和一个评级，而不是一个列表。更具体地说，我需要知道你想要这个 map reduce cicle 的最终结果是什么。请注意，如果需要，您可以对第一个结果运行第二个 map reduce。

【讨论】：

【参考方案2】：

您可以考虑使用Text 和MapWritable 作为映射器类的键值对。

这里用户ID是key（文本），由电影ID和用户评分组成的Mapwritable是value对象。

Mapwritable 值对象应以 MovieId 为键，用户评分为值。

考虑这个示例代码sn-p，

MapWritable result=new MapWritable();
result.put(new Text("movie1") , new Text("user1_movie1_rating"));
result.put(new Text("movie2") , new Text("user1_movie2_rating"));

Text key = new Text("user_1_id");

context.write(key, result);

希望这会有所帮助:) ..

【讨论】：

以上是关于如何从 Hadoop 中的地图程序中输出具有列表等数据结构的自定义类的主要内容，如果未能解决你的问题，请参考以下文章