使用hadoop将数据从Mysql导入hdfs时出错?

Posted

技术标签:

【中文标题】使用hadoop将数据从Mysql导入hdfs时出错?【英文标题】:Getting error while importing data from Mysql to hdfs using hadoop? 【发布时间】:2017-10-16 22:35:45 【问题描述】:

我正在尝试使用 hadoop 将数据从 mysql 导入到 hdfs,并且还尝试创建表和数据库是 hive。我正在尝试使用以下命令

sqoop import --connect jdbc:mysql://localhost/Mobile --username root --password 12345678 --table Accesories --target-dir /user/harsh/Mobile1 --fields-terminated-by "," --hive-import --create-hive-table --hive-table mob.cust

Mobile 是我的数据库,附件是我正在尝试导入的表。Eveeything 运行成功,但在尝试将数据加载到 hive 时出错。 但我得到以下错误:

17/10/17 03:51:30 INFO mapreduce.Job:  map 0% reduce 0%
17/10/17 03:52:46 INFO mapreduce.Job:  map 40% reduce 0%
17/10/17 03:52:48 INFO mapreduce.Job:  map 60% reduce 0%
17/10/17 03:52:50 INFO mapreduce.Job:  map 100% reduce 0%
17/10/17 03:53:13 INFO mapreduce.Job: Job job_1508188554902_0003 completed successfully
17/10/17 03:53:16 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=664170
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=521
        HDFS: Number of bytes written=88
        HDFS: Number of read operations=20
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=10
    Job Counters 
        Launched map tasks=5
        Other local map tasks=5
        Total time spent by all maps in occupied slots (ms)=398814
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=398814
        Total vcore-seconds taken by all map tasks=398814
        Total megabyte-seconds taken by all map tasks=408385536
    Map-Reduce Framework
        Map input records=5
        Map output records=5
        Input split bytes=521
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=8631
        CPU time spent (ms)=11240
        Physical memory (bytes) snapshot=768512000
        Virtual memory (bytes) snapshot=9839693824
        Total committed heap usage (bytes)=448790528
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=88
17/10/17 03:53:16 INFO mapreduce.ImportJobBase: Transferred 88 bytes in 144.4472 seconds (0.6092 bytes/sec)
17/10/17 03:53:16 INFO mapreduce.ImportJobBase: Retrieved 5 records.
Tue Oct 17 03:53:17 IST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
17/10/17 03:53:17 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Accesories` AS t LIMIT 1
17/10/17 03:53:18 INFO hive.HiveImport: Loading uploaded data into Hive
17/10/17 03:55:10 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
17/10/17 03:55:10 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
17/10/17 03:55:10 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
17/10/17 03:55:10 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
17/10/17 03:55:10 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
17/10/17 03:55:13 INFO hive.HiveImport: 
17/10/17 03:55:13 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
17/10/17 03:55:26 INFO hive.HiveImport: Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:578)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:518)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
17/10/17 03:55:26 INFO hive.HiveImport:     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
17/10/17 03:55:26 INFO hive.HiveImport:     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
17/10/17 03:55:26 INFO hive.HiveImport:     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
17/10/17 03:55:26 INFO hive.HiveImport:     at java.lang.reflect.Method.invoke(Method.java:498)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
17/10/17 03:55:26 INFO hive.HiveImport: Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:226)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:366)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:310)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:290)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:266)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:545)
17/10/17 03:55:26 INFO hive.HiveImport:     ... 8 more
17/10/17 03:55:26 INFO hive.HiveImport: Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1627)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:80)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:130)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:101)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3317)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3356)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3336)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3590)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:236)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:221)
17/10/17 03:55:26 INFO hive.HiveImport:     ... 13 more
17/10/17 03:55:26 INFO hive.HiveImport: Caused by: java.lang.reflect.InvocationTargetException
17/10/17 03:55:26 INFO hive.HiveImport:     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
17/10/17 03:55:26 INFO hive.HiveImport:     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
17/10/17 03:55:26 INFO hive.HiveImport:     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
17/10/17 03:55:26 INFO hive.HiveImport:     at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1625)
17/10/17 03:55:26 INFO hive.HiveImport:     ... 22 more
17/10/17 03:55:26 INFO hive.HiveImport: Caused by: javax.jdo.JDODataStoreException: Exception thrown obtaining schema column information from datastore
17/10/17 03:55:26 INFO hive.HiveImport: NestedThrowables:
17/10/17 03:55:26 INFO hive.HiveImport: java.sql.SQLException: Column name pattern can not be NULL or empty.
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:720)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:740)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.metastore.ObjectStore.setMetaStoreSchemaVersion(ObjectStore.java:7763)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:7657)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:7632)
17/10/17 03:55:26 INFO hive.HiveImport:     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
17/10/17 03:55:26 INFO hive.HiveImport:     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
17/10/17 03:55:26 INFO hive.HiveImport:     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
17/10/17 03:55:26 INFO hive.HiveImport:     at java.lang.reflect.Method.invoke(Method.java:498)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
17/10/17 03:55:26 INFO hive.HiveImport:     at com.sun.proxy.$Proxy21.verifySchema(Unknown Source)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:547)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:612)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:398)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:78)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6396)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:236)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70)
17/10/17 03:55:26 INFO hive.HiveImport:     ... 27 more
17/10/17 03:55:26 INFO hive.HiveImport: Caused by: java.sql.SQLException: Column name pattern can not be NULL or empty.
17/10/17 03:55:26 INFO hive.HiveImport:     at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:545)
17/10/17 03:55:26 INFO hive.HiveImport:     at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:513)
17/10/17 03:55:26 INFO hive.HiveImport:     at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:505)
17/10/17 03:55:26 INFO hive.HiveImport:     at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:479)
17/10/17 03:55:26 INFO hive.HiveImport:     at com.mysql.cj.jdbc.DatabaseMetaData.getColumns(DatabaseMetaData.java:2074)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.store.rdbms.adapter.BaseDatastoreAdapter.getColumns(BaseDatastoreAdapter.java:1575)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.refreshTableData(RDBMSSchemaHandler.java:1103)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.getRDBMSTableInfoForTable(RDBMSSchemaHandler.java:1015)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.getRDBMSTableInfoForTable(RDBMSSchemaHandler.java:965)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.getSchemaData(RDBMSSchemaHandler.java:338)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.store.rdbms.RDBMSStoreManager.getColumnInfoForTable(RDBMSStoreManager.java:2392)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.store.rdbms.table.TableImpl.initializeColumnInfoFromDatastore(TableImpl.java:324)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3401)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2877)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1608)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:671)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.store.rdbms.RDBMSStoreManager.getPropertiesForGenerator(RDBMSStoreManager.java:2069)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.store.AbstractStoreManager.getStrategyValue(AbstractStoreManager.java:1271)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.ExecutionContextImpl.newObjectId(ExecutionContextImpl.java:3759)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.state.StateManagerImpl.setIdentity(StateManagerImpl.java:2267)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:484)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:120)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.state.ObjectProviderFactoryImpl.newForPersistentNew(ObjectProviderFactoryImpl.java:218)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:2078)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:1922)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1777)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.ExecutionContextThreadedImpl.persistObject(ExecutionContextThreadedImpl.java:217)
17/10/17 03:55:26 INFO hive.HiveImport:     at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:715)
17/10/17 03:55:26 INFO hive.HiveImport:     ... 45 more
17/10/17 03:55:26 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive exited with status 1
    at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:389)
    at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:339)
    at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:240)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:514)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

Mysql 版本:5.7.19-0ubuntu0.17.04.1 蜂巢版本:版本 2 Hadoop版本:2.7.1

【问题讨论】:

基于issue 我的猜测是您遇到了 MySQL、Hadoop 和 Hive 之间不兼容的问题。 你能添加你正在使用的每个版本吗? 添加:Mysql 版本:5.7.19-0ubuntu0.17.04.1 Hive 版本:版本 2 Hadoop 版本:2.7.1 【参考方案1】:

您的数据似乎已成功从 mysql 导入到 hadoop 系统。所以hadoop和hive jar之间不兼容。 Log 说,这是 hive metasotre 的连接问题。那么,能否请您验证 hive-site.xml 和 hive lib 中的 hive-common jar 版本?

【讨论】:

lib 中 hive-common jar 的版本是 2.1.0,我没有在 site,xml 中指定任何版本的 jar【参考方案2】:

我所做的是确保我将 hive-site.xml 复制到我的 sqoop conf 文件夹中,看起来像这样 /opt/sqoop/conf/。一旦我这样做了,它就起作用了!!!。

【讨论】:

以上是关于使用hadoop将数据从Mysql导入hdfs时出错?的主要内容,如果未能解决你的问题,请参考以下文章

在 Hadoop 2.7.3 上执行简单 SQOOP 导入命令时出现 Sqoop 错误

csv数据导入Hadoop中的HDFS

DataX 实战案例 -- 使用datax实现将hdfs数据导入到mysql表中

如何从Oracle到hive

Sqoop从本地MySQL导入到Hive为啥要求Sqoop一定要在HDFS中

如何将mysql数据导入Hadoop之Sqoop安装