如何使用 Java 将表从 MySQL 导入 Hive?

Posted

技术标签:

【中文标题】如何使用 Java 将表从 MySQL 导入 Hive?【英文标题】:How to import table from MySQL to Hive using Java? 【发布时间】:2014-03-18 18:14:49 【问题描述】:

我正在尝试将表从 mysql 导入 Hive。但是,我收到以下错误,请您提供解决方案吗?

SqoopOptions 正在加载.....

导入工具正在运行 ....

14/03/18 06:48:34 WARN sqoop.ConnFactory:$SQOOP_CONF_DIR 尚未在环境中设置。无法检查其他配置。

14/03/18 06:48:43 INFO mapred.JobClient: SPLIT_RAW_BYTES=87

14/03/18 06:48:43 INFO mapred.JobClient:地图输出记录=2

14/03/18 06:48:43 INFO mapreduce.ImportJobBase:在 5.5688 秒内传输 18 个字节(3.2323 字节/秒)

14/03/18 06:48:43 INFO mapreduce.ImportJobBase:检索到 2 条记录。

14/03/18 06:48:43 INFO manager.SqlManager:执行 SQL 语句:SELECT t.* FROM student AS t WHERE 1=0

14/03/18 06:48:43 INFO manager.SqlManager:执行 SQL 语句:SELECT t.* FROM student AS t WHERE 1=0

14/03/18 06:48:43 INFO hive.HiveImport:将上传的数据加载到 Hive 中

警告:org.apache.hadoop.metrics.jvm.EventCounter 已弃用。请在所有 log4j.properties 文件中使用 org.apache.hadoop.log.metrics.EventCounter。

使用 jar:file:/home/master/apps/hive-0.10.0/lib/hive-common-0.10.0.jar!/hive-log4j.properties 中的配置初始化日志记录

Hive 历史文件=/tmp/master/hive_job_log_master_201403180648_1860851359.txt

失败:元数据错误:MetaException(消息:文件:/user/hive/warehouse/student 不是目录或无法创建目录)

FAILED:执行错误,从 org.apache.hadoop.hive.ql.exec.DDLTask 返回代码 1

失败!!!

我写的代码:

public class SqoopJavaInterface 
private static final String JOB_NAME = "Sqoop Hive Job";
private static final String MAPREDUCE_JOB = "Hive Map Reduce Job";
private static final String DBURL = "jdbc:mysql://localhost:3306/test";
private static final String DRIVER = "com.mysql.jdbc.Driver";
private static final String USERNAME = "root";
private static final String PASSWORD = "root";
private static final String HADOOP_HOME = "/home/master/apps/hadoop-1.0.4";


private static final String JAR_OUTPUT_DIR = "/home/master/data";
private static final String HIVE_HOME = "/home/master/apps/hive-0.10.0";
private static final String HIVE_DIR = "/user/hive/warehouse/";
private static final String WAREHOUSE_DIR = "hdfs://localhost:9000/user/hive/warehouse/student";
private static final String SUCCESS = "SUCCESS !!!";
private static final String FAIL = "FAIL !!!";

/**
 * @param table
 * @throws IOException
 */
public static void importToHive(String table) throws IOException 
    System.out.println("SqoopOptions loading .....");
    Configuration config = new Configuration();
    // Hive connection parameters
    config.addResource(new Path(HADOOP_HOME+"/conf/core-site.xml"));
    config.addResource(new Path(HADOOP_HOME+"/conf/hdfs-site.xml"));
    config.addResource(new Path(HIVE_HOME+"/conf/hive-site.xml"));
    FileSystem dfs =FileSystem.get(config);
    /* MySQL connection parameters */
    SqoopOptions options = new SqoopOptions(config);
    options.setConnectString(DBURL);
    options.setTableName(table);
    options.setDriverClassName(DRIVER);
    options.setUsername(USERNAME);
    options.setPassword(PASSWORD);
    options.setHadoopMapRedHome(HADOOP_HOME);
    options.setHiveHome(HIVE_HOME);
    options.setHiveImport(true);
    options.setHiveTableName(table);
    options.setOverwriteHiveTable(true);
    options.setFailIfHiveTableExists(false);
    options.setFieldsTerminatedBy(',');
    options.setOverwriteHiveTable(true);
    options.setDirectMode(true);
    options.setNumMappers(1); // No. of Mappers to be launched for the job
    options.setWarehouseDir(WAREHOUSE_DIR);
    options.setJobName(JOB_NAME);
    options.setMapreduceJobName(MAPREDUCE_JOB);
    options.setTableName(table);
    options.setJarOutputDir(JAR_OUTPUT_DIR);
    System.out.println("Import Tool running ....");
    ImportTool it = new ImportTool();
    int retVal = it.run(options);
    if (retVal == 0) 
        System.out.println(SUCCESS);
     else 
        System.out.println(FAIL);
    
    


当我执行上述代码时,我收到以下错误。你能提供解决方案吗?

Execution failed while executing command: 192.168.10.172
Error message: bash: 192.168.10.172: command not found
Now wait 5 seconds to begin next task ...
Connection channel disconnect
net.neoremind.sshxcute.core.Result@60c2be20
Command is sqoop import --connect jdbc:mysql://localhost:3316/hadoop --username root --password root --table employees --hive-import -m 1 -- --schema default
Connection channel established succesfully
Start to run command
Connection channel closed
Check if exec success or not ... 
Execution failed while executing command: sqoop import --connect jdbc:mysql://localhost:3316/hadoop --username root --password root --table employees --hive-import -m 1 -- --schema default
Error message: bash: sqoop: command not found
Now wait 5 seconds to begin next task ...
Connection channel disconnect
SSH connection shutdown

【问题讨论】:

在通过 Sqoop 导入数据之前,您是否在 Hive 中创建了表? FAILED: Error in metadata: MetaException(message:file:/user/hive/warehouse/student is not a directory or unable to create one) 基本上表示你想访问表“student”但是还没有创建 重复?:How to use Sqoop in Java Program?. 【参考方案1】:

由于 sqoop 选项方法已被弃用,您可以使用以下代码:

public static void importToHive() throws Exception

    Configuration config = new Configuration(); 
    config.addResource(new Path("/usr/local/hadoop/conf/core-site.xml"));
    config.addResource(new Path("/usr/local/hadoop/conf/hdfs-site.xml"));
    String[] cmd ="import", "--connect",<connectionString>,"--username", userName,
     "--password", password,"--hadoop-home", "/usr/local/hadoop","--table",<tableName>,   "--hive-import","--create-hive-table", "--hive-table",<tableName>,"-target-dir",
           "hdfs://localhost:54310/user/hive/warehouse","-m", "1","--delete-target-dir";

    Sqoop.runTool(cmd,config);

请为mysql使用正确的hadoop和hive仓库pathusernamepassword。请从core-site.xml 检查您的端口(在我的情况下是 54310)

【讨论】:

此代码不起作用,我收到以下错误 14/07/24 14:44:06 错误 tool.BaseSqoopTool:解析导入参数时出错:14/07/24 14:44:06 错误tool.BaseSqoopTool:无法识别的参数:--hadoop-home 14/07/24 14:44:06 错误工具.BaseSqoopTool:无法识别的参数:/opt/hadoop-1.0.4 14/07/24 14:44:06 错误工具.BaseSqoopTool:无法识别的参数:--table 14/07/24 14:44:06 错误工具。BaseSqoopTool:无法识别的参数:部门 14/07/24 14:44:06 错误工具.BaseSqoopTool:无法识别的参数:

以上是关于如何使用 Java 将表从 MySQL 导入 Hive?的主要内容,如果未能解决你的问题,请参考以下文章

使用 sqoop 将表从 RDBMS 导入 HIVE 后约束是不是仍然存在?

如何将表从mysql数据库导出到excel?

如何将表从一个mysql数据库复制到另一个mysql数据库

有没有办法在不安装任何驱动程序的情况下将表从雪花导入 R 中的数据帧?

将表从数据库导出到 csv 文件

将表从 Dev DB 复制到 QA DB