使用 Pig 将 csv 导入 HBase
Posted
技术标签:
【中文标题】使用 Pig 将 csv 导入 HBase【英文标题】:Importing csv into HBase using Pig 【发布时间】:2015-06-11 20:42:34 【问题描述】:我想使用 Pig 将以下示例数据(制表符分隔)导入 HBase
1 2 3
4 5 6
7 8 9
并且我正在使用以下命令来实现相同的目的。
grunt> A = LOAD '/idn/home/mvenk9/Test' USING PigStorage('\t') as (id:int, id1:int, id2:int);
STORE A INTO 'hbase://mydata' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycf:intdata');
在执行第二行时出现以下异常,我不知道为什么这不起作用,并且对所有这些工具都是新手..
2015-06-11 13:34:37,125 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2015-06-11 13:34:37,126 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]
2015-06-11 13:34:37,442 [main] INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - The identifier of this process is 29965@lppbd0030.gso.aexp.com
2015-06-11 13:34:37,554 [main] INFO org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table instance for mydata
2015-06-11 13:34:37,557 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2015-06-11 13:34:37,559 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2015-06-11 13:34:37,559 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2015-06-11 13:34:37,561 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2015-06-11 13:34:37,562 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-06-11 13:34:37,563 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job2235913801538823778.jar
2015-06-11 13:34:40,868 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job2235913801538823778.jar created
2015-06-11 13:34:40,882 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-06-11 13:34:40,885 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error creating job configuration.
Details at logfile: /idn/home/mvenk9/pig_1434054848332.log
从日志文件中
Pig Stack Trace
---------------
ERROR 2017: Internal error creating job configuration.
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias A
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1635)
at org.apache.pig.PigServer.registerQuery(PigServer.java:575)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:541)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:861)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:296)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:192)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1322)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1307)
at org.apache.pig.PigServer.execute(PigServer.java:1297)
at org.apache.pig.PigServer.access$400(PigServer.java:122)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1630)
... 13 more
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hbase://mydata_logs
at org.apache.hadoop.fs.Path.initialize(Path.java:155)
at org.apache.hadoop.fs.Path.<init>(Path.java:74)
at org.apache.hadoop.fs.Path.<init>(Path.java:48)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:613)
... 20 more
Caused by: java.net.URISyntaxException: Relative path in absolute URI: hbase://mydata_logs
at java.net.URI.checkPath(URI.java:1804)
at java.net.URI.<init>(URI.java:752)
at org.apache.hadoop.fs.Path.initialize(Path.java:152)
... 23 more
================================================================================
任何帮助将不胜感激。
提前谢谢你。
【问题讨论】:
【参考方案1】:将 hbase 中的各个列名称作为参数添加到 HBaseStorage。你只给了一个单元格 mycf:intdata。查看here 和here 的示例
【讨论】:
谢谢你,我也参考了这些链接并尝试了所有列然后也得到了同样的例外。 更新语句是 STORE A INTO 'hbase://mydata' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycf:id mycf:id1 mycf:id2'); 您是在本地还是集群中运行。并添加有关您的跑步方式的详细信息 嗨,我在集群上运行这些,在使用 putty 登录到我的集群后,我将使用以下命令启动 pig,一旦进入 pig I将执行上述语句。在第一条语句之后,我可以使用转储 A 在 A 中看到结果。但是当我执行第二条语句时,我得到了错误。以上是关于使用 Pig 将 csv 导入 HBase的主要内容,如果未能解决你的问题,请参考以下文章
将数据从 hdfs 导入到 hbase 是不是会创建一个副本