使用 Sqoop 从 Redshift 导入数据到 Hive

Posted

技术标签:

【中文标题】使用 Sqoop 从 Redshift 导入数据到 Hive【英文标题】:Using Sqoop to import data from Redshift To Hive 【发布时间】:2017-04-21 15:23:28 【问题描述】:

我收到错误消息:Could not load db driver class

连接和错误如下。下面是 lib 目录中的 jar 文件列表。我做错了什么?

sqoop import 
--connect jdbc:redshift://< > 
--username < > --password < > 
--driver com.amazon.redshift.jdbc.Driver 
--table import-all-tables

17/04/21 11:14:46 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258
17/04/21 11:14:46 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/04/21 11:14:46 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
17/04/21 11:14:46 INFO manager.SqlManager: Using default fetchSize of 1000
17/04/21 11:14:46 INFO tool.CodeGenTool: Beginning code generation
17/04/21 11:14:46 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.amazon.redshift.jdbc.Driver
java.lang.RuntimeException: Could not load db driver class: com.amazon.redshift.jdbc.Driver
        at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:856)
        at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
        at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:744)
        at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:767)
        at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
        at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
        at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:227)
        at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
        at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1845)
        at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
        at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:244)



[t lib]$ ls
ant-contrib-1.0b3.jar          hsqldb-1.8.0.10.jar            kite-hadoop-compatibility-1.0.0.jar  parquet-generator-1.4.1.jar
ant-eclipse-1.0-jvm1.2.jar     jackson-annotations-2.3.0.jar  mysql-connector-java.jar             parquet-hadoop-1.4.1.jar
avro-1.7.5.jar                 jackson-core-2.3.1.jar         opencsv-2.3.jar                      parquet-jackson-1.4.1.jar
avro-mapred-1.7.5-hadoop2.jar  jackson-core-asl-1.9.13.jar    paranamer-2.3.jar                    RedshiftJDBC42-1.2.1.1001 (2).jar
commons-codec-1.4.jar          jackson-databind-2.3.1.jar     parquet-avro-1.4.1.jar               slf4j-api-1.6.1.jar
commons-compress-1.4.1.jar     jackson-mapper-asl-1.9.13.jar  parquet-column-1.4.1.jar             snappy-java-1.0.5.jar
commons-io-1.4.jar             kite-data-core-1.0.0.jar       parquet-common-1.4.1.jar             xz-1.0.jar
commons-jexl-2.1.1.jar         kite-data-hive-1.0.0.jar       parquet-encoding-1.4.1.jar
commons-logging-1.1.1.jar      kite-data-mapreduce-1.0.0.jar  parquet-format-2.0.0.jar

【问题讨论】:

【参考方案1】:

您的 jdbc 不在 sqoop/lib 中,因此请下载您的有效 jdbc 驱动程序并复制到 sqoop/lib

【讨论】:

我将 jdbc 驱动程序添加到该文件夹​​中,但仍然出现相同的错误。

以上是关于使用 Sqoop 从 Redshift 导入数据到 Hive的主要内容,如果未能解决你的问题,请参考以下文章

使用 sqoop 将数据从 oracle 导入到 hdfs

使用sqoop从mysql导入数据到hive

sqoop从mysql导入到hive中问题

使用 sqoop 从 Oracle 到 hive 的日期导入问题

利用SQOOP将数据从数据库导入到HDFS(并行导入,增量导入)

我用sqoop从oracle导入数据到hdfs时,总是报 表或视图不存在错误,求解答...