java.lang.ArrayIndexOutOfBoundsException：mapreduce 中的 2 个错误，Hadoop

Posted 2023-04-18

技术标签:

【中文标题】java.lang.ArrayIndexOutOfBoundsException：mapreduce 中的 2 个错误，Hadoop【英文标题】：java.lang.ArrayIndexOutOfBoundsException: 2 error in mapreduce, Hadoop 【发布时间】：2017-10-09 16:10:51 【问题描述】：

我试图用 hadoop 解决这个问题。

使用平均评分查找评分最高的 10 家企业。评分最高的业务将排在首位。回想一下 review.csv 文件中的第 4 列代表评分。

我的java代码是：

package bd;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Map.Entry;
import java.util.TreeMap;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;


    public class TopTenRatedBusiness 

        /*
         * Mapper Class : BusinessRatingMapper
         * Class BusinessRatingMapper parses review.csv file and emits business id and respective rating
         */
        public static class BusinessRatingMapper extends Mapper<LongWritable, Text, Text, FloatWritable> 
            /*
             * Map function that emits a business ID as a key and rating as a value
             */
            @Override
            protected void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException 

                String reviews[] = value.toString().split("::");
                /*
                 * reviews[2] gives business id and reviews[3] gives business rating
                 */
                context.write(new Text(reviews[2]), new FloatWritable(Float.parseFloat(reviews[3])));

            
         

        /*
         * Reducer class: TopRatedBusinessReducer
         * Class TopRatedBusinessReducer emits top 10 business id with their average rating
         */
        static TreeMap<Float, List<Text>> reviewID = new TreeMap<Float, List<Text>>(Collections.reverseOrder());

        public static class BusinessRatingReducer extends Reducer<Text, FloatWritable, Text, FloatWritable> 

            /*
             * Reduce function
             */
            public void reduce(Text key, Iterable<FloatWritable> values, Context context)throws IOException, InterruptedException 
                float sumOfRatings =  0;
                int countOfRatings = 0;
                for (FloatWritable value : values) 
                    sumOfRatings += value.get();
                    countOfRatings++; 
                

                Float averageRating = sumOfRatings / countOfRatings;

                if (reviewID.containsKey(averageRating)) 
                    reviewID.get(averageRating).add(new Text(key.toString()));
                 else 
                    List<Text> businessIDList = new ArrayList<Text>();
                    businessIDList.add(new Text(key.toString()));

                    /*
                     * Putting average rating and corresponding business ID
                     */
                    reviewID.put(averageRating, businessIDList);
                
            


            @Override
            protected void cleanup(Reducer<Text, FloatWritable, Text, FloatWritable>.Context context)throws IOException, InterruptedException 

                int count=0;
                for(Entry<Float, List<Text>> entry : reviewID.entrySet()) 
                    if(count > 10)
                        break;
                    

                 FloatWritable result=new FloatWritable();
                 result.set(entry.getKey());

                 for (int i = 0; i <entry.getValue().size(); i++) 
                      if (count >= 10) 
                            break;
                      
                       context.write(new Text(entry.getValue().get(i).toString()), result);
                       count++;
                  

                  

            
        

            /*
             * Driver Program
             */

            public static void main(String[] args) throws IOException,ClassNotFoundException, InterruptedException, NoSuchMethodException 

                Configuration conf = new Configuration();
                String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
                if (otherArgs.length != 2) 
                    System.err.println("Usage: TopTenRatedBusiness <in> <out>");
                    System.exit(2);

                
                /*
                 * Create a job with name "TopTenRatedBusiness"
                 */

                Job job = new Job(conf, "TopTenRatedBusiness");
                job.setJarByClass(TopTenRatedBusiness.class);

                job.setMapperClass(BusinessRatingMapper.class);
                job.setMapOutputKeyClass(Text.class);
                job.setMapOutputValueClass(FloatWritable.class);

                job.setReducerClass(BusinessRatingReducer.class);
                job.setOutputKeyClass(Text.class);
                job.setOutputValueClass(FloatWritable.class);

                FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
                FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
                System.exit(job.waitForCompletion(true) ? 0 : 1);

我的数据集：

review.csv 文件包含用户给企业的星级评分。使用 user_id 将此评论与同一用户的其他评论相关联。使用 business_id 将此评论与同一商家的其他评论相关联。

review.csv file contains the following columns "review_id"::"user_id"::"business_id"::"stars" 
'review_id': (a unique identifier for the review) 
'user_id': (the identifier of the reviewed business), 
'business_id': (the identifier of the authoring user), 
'stars': (star rating, integer 1-5),the rating given by the user to a business

运行时出现以下错误：

17/10/09 21:18:33 INFO input.FileInputFormat: Total input paths to process : 1
17/10/09 21:18:33 INFO util.NativeCodeLoader: Loaded the native-hadoop library
17/10/09 21:18:33 WARN snappy.LoadSnappy: Snappy native library not loaded
17/10/09 21:18:34 INFO mapred.JobClient: Running job: job_201710090351_0033
17/10/09 21:18:35 INFO mapred.JobClient:  map 0% reduce 0%
17/10/09 21:18:41 INFO mapred.JobClient: Task Id : attempt_201710090351_0033_m_000000_0, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: 2
    at bd.TopTenRatedBusiness$BusinessRatingMapper.map(TopTenRatedBusiness.java:37)
    at bd.TopTenRatedBusiness$BusinessRatingMapper.map(TopTenRatedBusiness.java:26)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

17/10/09 21:18:47 INFO mapred.JobClient: Task Id : attempt_201710090351_0033_m_000000_1, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: 2
    at bd.TopTenRatedBusiness$BusinessRatingMapper.map(TopTenRatedBusiness.java:37)
    at bd.TopTenRatedBusiness$BusinessRatingMapper.map(TopTenRatedBusiness.java:26)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

17/10/09 21:18:52 INFO mapred.JobClient: Task Id : attempt_201710090351_0033_m_000000_2, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: 2
    at bd.TopTenRatedBusiness$BusinessRatingMapper.map(TopTenRatedBusiness.java:37)
    at bd.TopTenRatedBusiness$BusinessRatingMapper.map(TopTenRatedBusiness.java:26)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

17/10/09 21:18:58 INFO mapred.JobClient: Job complete: job_201710090351_0033
17/10/09 21:18:58 INFO mapred.JobClient: Counters: 7
17/10/09 21:18:58 INFO mapred.JobClient:   Job Counters 
17/10/09 21:18:58 INFO mapred.JobClient:     Launched map tasks=4
17/10/09 21:18:58 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
17/10/09 21:18:58 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
17/10/09 21:18:58 INFO mapred.JobClient:     Failed map tasks=1
17/10/09 21:18:58 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=23391
17/10/09 21:18:58 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
17/10/09 21:18:58 INFO mapred.JobClient:     Data-local map tasks=4

几个示例输入行

0xuZfa0t4MNWd3eIFF02ug::kT43SxDgMGzbeXpO51f0hQ::wbpbaWBfU54JbjLIDwERQA::5.0
bBqVqhOvNgFs8I1Wk68QUQ::T9hGHsbJW9Hw1cJAlIAWmw::4iTRjN_uAdAb7_YZDVHJdg::5.0
fu7TcxnAOdnbdLcyFhMmZg::Z_WAxc4RUpKp3y12BH1bEg::qw5gR8vW7mSOK4VROSwdMA::4.0
LMy8UOKOeh0b9qrz-s1fQA::OlMjqqzWZUv2-62CSqKq_A::81IjU5L-t-QQwsE38C63hQ::4.0
JjyRj9EiBXQTFDQAxRtt4g::fs5bpfk-2pvq2v8S1De5pQ::Hnz1_h_D1eHSRtQqHSCZkw::2.0

【问题讨论】：

【参考方案1】：

您的代码适用于示例输入。

因此，您的数据似乎存在问题，其中会出现无法处理的错误行。您可以检查是否有任何标题列，或者您需要浏览完整文件。

您可以检查的另一件事是您提供的输入目录路径只有一个 review.CSV 文件，没有别的。

【讨论】：

没有标题。你能进一步指导我吗？我google了一下，发现github.com/patilankita79/…，这是你要运行的输入文件吗？ (review.csv) 我使用您编写的代码运行了 review.csv，并且能够成功运行它。你的输入路径的结构是什么？输入路径目录应该只包含 review.csv 文件。好的。 review.csv 与其他一些 csv 文件一起在我的 hadoop 文件系统中。那么，该特定目录中应该只存在 1 个输入文件吗？您的目录中应该只存在一种类型的文件。即您可以拥有多个具有不同数据集的 review.csv、review1.csv、review2.csv 文件。如果输入路径中有不同类型的输入文件，MR作业将无法正确处理。【参考方案2】：

这一行给你一个错误

context.write(new Text(reviews[2]), new FloatWritable(Float.parseFloat(reviews[3])));

尝试使用调试器修复此问题

【讨论】：

您能否提出更改建议？我是 hadoop 新手你得到这个问题的解决方案了吗？由于同样的问题，我似乎被卡住了。

以上是关于java.lang.ArrayIndexOutOfBoundsException：mapreduce 中的 2 个错误，Hadoop的主要内容，如果未能解决你的问题，请参考以下文章