无法从 MapReduce 代码访问 HBase

Posted

技术标签:

【中文标题】无法从 MapReduce 代码访问 HBase【英文标题】:Unable to access HBase from MapReduce code 【发布时间】:2015-05-25 20:55:47 【问题描述】:

我正在尝试使用 HDFS 文件作为源和 HBase 作为接收器。我的 Hadoop 集群具有以下规格:

master 192.168.4.65
slave1 192.168.4.176
slave2 192.168.4.175
slave3 192.168.4.57
slave4 192.168.4.146

Zookeeper 节点位于以下 IP 地址:

zks1 192.168.4.60
zks2 192.168.4.61
zks3 192.168.4.66

HBase 节点位于以下 IP 地址:

hbmaster 192.168.4.59
rs1 192.168.4.69
rs2 192.168.4.110
rs3 192.168.4.45

我已将 zookeeper-3.4.6.jar 复制到 Hadoop 集群中所有节点的 lib 文件夹中。 单独 HDFS 和 HBase 工作正常。但我无法使用 MapReduce 访问 HBase。 MapReduce 代码的驱动代码是:

    import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;


public class Driver 
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException 
    
        // TODO Auto-generated method stub
        //Configuration conf=new Configuration();
        Configuration conf = HBaseConfiguration.create();
        conf.addResource("hbase-site.xml");
        //conf.set("hbase.master","hbmaster:60000");
        conf.set("hbase.zookeeper.quorum", "zks1,zks2,zks3");
        conf.set("hbase.zookeeper.property.clientPort", "2181");

        try
            HBaseAdmin.checkHBaseAvailable(conf);
        
        catch(MasterNotRunningException e)
            System.out.println("Master not running");
            System.exit(1);
        
         System.out.println("connected to hbase");
        //HTable hTable = new HTable(conf, args[3]);  
        String[] otherArgs=new GenericOptionsParser(conf,args).getRemainingArgs();
        if(otherArgs.length!=2)
        
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        
        Job job=new Job(conf,"HBase Provenance Component");
        job.setJarByClass(Driver.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class); 
        job.setMapperClass(HBaseMapper.class);
        job.setReducerClass(HBaseReducer.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
        //TableMapReduceUtil.addDependencyJars(job);
        //TableMapReduceUtil.addHBaseDependencyJars(conf);
        System.out.println("Before");
        TableMapReduceUtil.initTableReducerJob("Provenance", HBaseReducer.class, job);

        //job.setOutputKeyClass(NullWritable.class);
        //job.setOutputValueClass(Text.class);
        //job.setOutputKeyClass(Text.class);
        //job.setOutputValueClass(Text.class);
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        System.out.println("After");
        //HFileOutputFormat.configureIncrementalLoad(job, hTable);
        System.exit(job.waitForCompletion(true)?0:1);
    

此时作业挂起:

   Warning: $HADOOP_HOME is deprecated.

15/05/27 11:46:35 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 6608@master
15/05/27 11:46:35 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
15/05/27 11:46:35 INFO zookeeper.ZooKeeper: Client environment:host.name=master
15/05/27 11:46:35 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_79
15/05/27 11:46:35 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
15/05/27 11:46:35 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-openjdk-amd64/jre
15/05/27 11:46:35 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/usr/local/hadoop/libexec/../conf:/usr/lib/jvm/java-7-openjdk-amd64/lib/tools.jar:/usr/local/hadoop/libexec/..:/usr/local/hadoop/libexec/../hadoop-core-1.2.1.jar:/usr/local/hadoop/libexec/../lib/asm-3.2.jar:/usr/local/hadoop/libexec/../lib/aspectjrt-1.6.11.jar:/usr/local/hadoop/libexec/../lib/aspectjtools-1.6.11.jar:/usr/local/hadoop/libexec/../lib/commons-beanutils-1.7.0.jar:/usr/local/hadoop/libexec/../lib/commons-beanutils-core-1.8.0.jar:/usr/local/hadoop/libexec/../lib/commons-cli-1.2.jar:/usr/local/hadoop/libexec/../lib/commons-codec-1.4.jar:/usr/local/hadoop/libexec/../lib/commons-collections-3.2.1.jar:/usr/local/hadoop/libexec/../lib/commons-configuration-1.6.jar:/usr/local/hadoop/libexec/../lib/commons-daemon-1.0.1.jar:/usr/local/hadoop/libexec/../lib/commons-digester-1.8.jar:/usr/local/hadoop/libexec/../lib/commons-el-1.0.jar:/usr/local/hadoop/libexec/../lib/commons-httpclient-3.0.1.jar:/usr/local/hadoop/libexec/../lib/commons-io-2.1.jar:/usr/local/hadoop/libexec/../lib/commons-lang-2.4.jar:/usr/local/hadoop/libexec/../lib/commons-logging-1.1.1.jar:/usr/local/hadoop/libexec/../lib/commons-logging-api-1.0.4.jar:/usr/local/hadoop/libexec/../lib/commons-math-2.1.jar:/usr/local/hadoop/libexec/../lib/commons-net-3.1.jar:/usr/local/hadoop/libexec/../lib/core-3.1.1.jar:/usr/local/hadoop/libexec/../lib/guava-18.0.jar:/usr/local/hadoop/libexec/../lib/hadoop-capacity-scheduler-1.2.1.jar:/usr/local/hadoop/libexec/../lib/hadoop-fairscheduler-1.2.1.jar:/usr/local/hadoop/libexec/../lib/hadoop-thriftfs-1.2.1.jar:/usr/local/hadoop/libexec/../lib/hbase-0.94.24.jar:/usr/local/hadoop/libexec/../lib/hsqldb-1.8.0.10.jar:/usr/local/hadoop/libexec/../lib/jackson-core-asl-1.8.8.jar:/usr/local/hadoop/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/usr/local/hadoop/libexec/../lib/jasper-compiler-5.5.12.jar:/usr/local/hadoop/libexec/../lib/jasper-runtime-5.5.12.jar:/usr/local/hadoop/libexec/../lib/jdeb-0.8.jar:/usr/local/hadoop/libexec/../lib/jersey-core-1.8.jar:/usr/local/hadoop/libexec/../lib/jersey-json-1.8.jar:/usr/local/hadoop/libexec/../lib/jersey-server-1.8.jar:/usr/local/hadoop/libexec/../lib/jets3t-0.6.1.jar:/usr/local/hadoop/libexec/../lib/jetty-6.1.26.jar:/usr/local/hadoop/libexec/../lib/jetty-util-6.1.26.jar:/usr/local/hadoop/libexec/../lib/jsch-0.1.42.jar:/usr/local/hadoop/libexec/../lib/junit-4.5.jar:/usr/local/hadoop/libexec/../lib/kfs-0.2.2.jar:/usr/local/hadoop/libexec/../lib/log4j-1.2.15.jar:/usr/local/hadoop/libexec/../lib/mockito-all-1.8.5.jar:/usr/local/hadoop/libexec/../lib/oro-2.0.8.jar:/usr/local/hadoop/libexec/../lib/protobuf-java-2.4.0a.jar:/usr/local/hadoop/libexec/../lib/servlet-api-2.5-20081211.jar:/usr/local/hadoop/libexec/../lib/slf4j-api-1.4.3.jar:/usr/local/hadoop/libexec/../lib/slf4j-log4j12-1.4.3.jar:/usr/local/hadoop/libexec/../lib/xmlenc-0.52.jar:/usr/local/hadoop/libexec/../lib/zookeeper-3.4.6.jar:/usr/local/hadoop/libexec/../lib/jsp-2.1/jsp-2.1.jar:/usr/local/hadoop/libexec/../lib/jsp-2.1/jsp-api-2.1.jar
15/05/27 11:46:35 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/local/hadoop/libexec/../lib/native/Linux-amd64-64
15/05/27 11:46:35 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
15/05/27 11:46:35 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
15/05/27 11:46:35 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
15/05/27 11:46:35 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
15/05/27 11:46:35 INFO zookeeper.ZooKeeper: Client environment:os.version=3.5.0-60-generic
15/05/27 11:46:35 INFO zookeeper.ZooKeeper: Client environment:user.name=hduser
15/05/27 11:46:35 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hduser
15/05/27 11:46:35 INFO zookeeper.ZooKeeper: Client environment:user.dir=/usr/local/hadoop
15/05/27 11:46:35 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=zks2:2181,zks1:2181,zks3:2181 sessionTimeout=180000 watcher=hconnection
15/05/27 11:46:35 INFO zookeeper.ClientCnxn: Opening socket connection to server zks3/192.168.4.66:2181. Will not attempt to authenticate using SASL (unknown error)
15/05/27 11:46:35 INFO zookeeper.ClientCnxn: Socket connection established to zks3/192.168.4.66:2181, initiating session
15/05/27 11:46:35 INFO zookeeper.ClientCnxn: Session establishment complete on server zks3/192.168.4.66:2181, sessionid = 0x34d94005a530003, negotiated timeout = 60000
15/05/27 11:46:35 INFO client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x34d94005a530003
15/05/27 11:46:35 INFO zookeeper.ZooKeeper: Session: 0x34d94005a530003 closed
15/05/27 11:46:35 INFO zookeeper.ClientCnxn: EventThread shut down
connected to hbase
Before
After
15/05/27 11:46:38 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 6608@master
15/05/27 11:46:38 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=zks2:2181,zks1:2181,zks3:2181 sessionTimeout=180000 watcher=hconnection
15/05/27 11:46:38 INFO zookeeper.ClientCnxn: Opening socket connection to server zks2/192.168.4.61:2181. Will not attempt to authenticate using SASL (unknown error)
15/05/27 11:46:38 INFO zookeeper.ClientCnxn: Socket connection established to zks2/192.168.4.61:2181, initiating session
15/05/27 11:46:38 INFO zookeeper.ClientCnxn: Session establishment complete on server zks2/192.168.4.61:2181, sessionid = 0x24d93f965730002, negotiated timeout = 60000

请指出我错在哪里。

【问题讨论】:

【参考方案1】:

代码缺少番石榴jar。请添加它并尝试运行代码。您还可以使用 maven,这样一旦您提供了基本依赖项,它就会获取所需的所有其他依赖项

【讨论】:

解决了这个问题。现在,当我启动 MapReduce 作业时,与 ZooKeeper 的连接已成功建立,但 MapReduce 作业没有启动。没有打印新的语句,也没有开始作业。知道为什么会发生这种情况吗? 您是否尝试过在 HBaseMapper 类中保留任何系统输出并验证流程?或者是 Apache Storm 框架的映射器类的一部分 我已经尝试过 Sysouts。他们都工作正常,但工作挂在最后一行 System.exit(job.waitForCompletion(true)?0:1);映射器不是 Apache Storm 框架的一部分。我正在尝试处理 HDFS 中的文件并将输出存储到 HBase 表。我正在使用 Hadoop 1.2.1、HBase 0.94.24 和 ZooKeeper 3.4.6 我已经更新了当前的驱动代码和上面问题中的当前错误请看一下。

以上是关于无法从 MapReduce 代码访问 HBase的主要内容,如果未能解决你的问题,请参考以下文章

Python - Mapreduce - PermissionError:[WinError 5]访问被拒绝

mapreduce 程序从 hive 读取数据

如何访问在mapreduce中扩展reducer的静态内部类中的静态变量?

MapReduce 作业永远运行

无法使用 Xcode 从代码访问文件 - ios

十MapReduce--调优