猪没有使用 Hcatalog 定位 Hive 表

Posted

技术标签:

【中文标题】猪没有使用 Hcatalog 定位 Hive 表【英文标题】:Pig not locating Hive Table using Hcatalog 【发布时间】:2014-12-31 19:55:21 【问题描述】:

我使用 PIG 访问通过 HCatalog 创建的表 batting_data。这样做时,我遇到一个错误,说找不到提到的表。但是,这个 batting_data 表在 HIVE 中可用。我也明白,如果未提及数据库名称,则假定为默认值。

错误 org.apache.pig.tools.grunt.Grunt - 错误 1115:未找到表:未找到 default.batting_data 表

    我已经设置 hive-site.xml 如下。请注意我没有使用远程服务器作为元存储,而是使用本地服务器 mysql

    <configuration>
    <property>
            <name>javax.jdo.option.ConnectionURL</name>
            <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
            <description>the URL of the MySQL database</description>
    </property>
    
    <property>
            <name>javax.jdo.option.ConnectionDriverName</name>
            <value>com.mysql.jdbc.Driver</value>
    </property>
    
    <property>
            <name>javax.jdo.option.ConnectionUserName</name>
            <value>root</value>
    </property>
    
    <property>
            <name>javax.jdo.option.ConnectionPassword</name>
            <value>root</value>
    </property>
    
    <property>
            <name>hive.hwi.listen.host</name>
            <value>0.0.0.0</value>
    </property>
    <property>
            <name>hive.hwi.listen.port</name>
            <value>9999</value>
    </property>
    <property>
            <name>hive.hwi.war.file</name>
            <value>lib/hive-hwi-0.12.0.war</value>
    </property>
    
    <property>
            <name>hive.metastore.local</name>
            <value>true</value>
    </property>
    

    我在我的 .bashrc 中设置了以下内容,用于 PIG 与 HIVE 和 HCATALOG 的集成。

    导出 PIG_OPTS=-Dhive.metastore.local=true 导出 PIG_CLASSPATH=$HCAT_HOME/share/hcatalog/:$HIVE_HOME/lib/

    当 PIG 启动时,GRUNT shell 会默认加载下面的语句。

    注册 /home/shiva/hive-0.12.0/hcatalog/share/hcatalog/hcatalog-core-0.12.0.jar; 注册/home/shiva/hive-0.12.0/lib/hive-exec-0.12.0.jar; 注册 /home/shiva/hive-0.12.0/lib/hive-metastore-0.12.0.jar;


错误消息的完整日志如下。任何解决此问题的帮助将不胜感激。谢谢。

grunt> a = LOAD 'batting_data' USING org.apache.hcatalog.pig.HCatLoader();         
2015-01-01 01:06:33,849 [main] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
2015-01-01 01:06:33,865 [main] INFO  org.apache.hadoop.hive.metastore.ObjectStore - ObjectStore, initialize called
2015-01-01 01:06:34,049 [main] INFO  DataNucleus.Persistence - Property datanucleus.cache.level2 unknown - will be ignored
2015-01-01 01:06:34,365 [main] WARN  com.jolbox.bonecp.BoneCPConfig - Max Connections < 1. Setting to 20
2015-01-01 01:06:35,470 [main] INFO  org.apache.hadoop.hive.metastore.ObjectStore - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
2015-01-01 01:06:35,501 [main] INFO  org.apache.hadoop.hive.metastore.ObjectStore - Initialized ObjectStore
2015-01-01 01:06:36,265 [main] WARN  com.jolbox.bonecp.BoneCPConfig - Max Connections < 1. Setting to 20
2015-01-01 01:06:36,506 [main] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_database: NonExistentDatabaseUsedForHealthCheck
2015-01-01 01:06:36,506 [main] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=shiva   ip=unknown-ip-addr  cmd=get_database: NonExistentDatabaseUsedForHealthCheck 
2015-01-01 01:06:36,512 [main] ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler - NoSuchObjectException(message:There is no database named nonexistentdatabaseusedforhealthcheck)
    at org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:431)
    at org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:441)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
    at com.sun.proxy.$Proxy6.getDatabase(Unknown Source)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:628)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
    at com.sun.proxy.$Proxy7.get_database(Unknown Source)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:810)
    at org.apache.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient.isOpen(HiveClientCache.java:277)
    at org.apache.hcatalog.common.HiveClientCache.get(HiveClientCache.java:147)
    at org.apache.hcatalog.common.HCatUtil.getHiveClient(HCatUtil.java:547)
    at org.apache.hcatalog.pig.PigHCatUtil.getHiveMetaClient(PigHCatUtil.java:150)
    at org.apache.hcatalog.pig.PigHCatUtil.getTable(PigHCatUtil.java:186)
    at org.apache.hcatalog.pig.HCatLoader.getSchema(HCatLoader.java:194)
    at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175)
    at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89)
    at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:853)
    at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3479)
    at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1536)
    at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1013)
    at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:553)
    at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
    at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
    at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1648)
    at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1621)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:575)
    at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
    at org.apache.pig.Main.run(Main.java:541)
    at org.apache.pig.Main.main(Main.java:156)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

2015-01-01 01:06:36,514 [main] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=default tbl=batting_data
2015-01-01 01:06:36,514 [main] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=shiva   ip=unknown-ip-addr  cmd=get_table : db=default tbl=batting_data 
2015-01-01 01:06:36,516 [main] INFO  DataNucleus.Datastore - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
2015-01-01 01:06:36,516 [main] INFO  DataNucleus.Datastore - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
2015-01-01 01:06:36,795 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1115: Table not found : default.batting_data table not found
Details at logfile: /home/shiva/pig_1420054544179.log

【问题讨论】:

【参考方案1】:

好的。我修好了。

    我没有提到 PIG_OPTS 具有正确的 HIVE THRIFT 服务器地址,因此 PIG 无法连接到 HIVE 元存储,因此找不到表。 改成PIG_OPTS=-Dhive.metastore.uris=thrift://localhost:10000

    使用

    启动HIVESERVER服务

    $ bin/hive --service hiveserver

以上解决了问题,现在可以将 PIG 连接到 HIVE。 谢谢

【讨论】:

以上是关于猪没有使用 Hcatalog 定位 Hive 表的主要内容,如果未能解决你的问题,请参考以下文章

hortonworks 沙盒猪脚本

使用Hive表名将Sqoop导出hive表导出到RDBMS

无法使用导入解析 org.apache.hcatalog.pig.hcatloader

在 Pig 中使用 Hcatalog 加载配置单元表时出错

Hive 真的使用 HCatalog 吗?

Hadoop + Hive - hcatalog 不会启动