Sqoop "import-all-tables" 无法导入所有表
Posted
技术标签:
【中文标题】Sqoop "import-all-tables" 无法导入所有表【英文标题】:Sqoop "import-all-tables" unable to import all tables 【发布时间】:2016-10-12 10:31:46 【问题描述】:这是我用来将数据从 SQL Server 导入 Hive 的 sqoop 命令sqoop-import-all-tables --connect "jdbc:sqlserver://ip.ip.ip.ip\MIGERATIONSERVER;port=1433;username=sa;password=blablaq;database=sqlserverdb" --create-hive-table --hive-import --hive-database hivemtdb
问题是sqlserverdb
有大约 100 个表,但是当我发出此命令时,它只是将 6 或 7 个随机表导入配置单元。这种行为对我来说真的很奇怪。我无法找到我做错的地方。 编辑:1
Warning: /usr/hdp/2.4.3.0-227/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/10/13 13:17:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.3.0-227
16/10/13 13:17:38 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
16/10/13 13:17:38 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
16/10/13 13:17:38 INFO manager.SqlManager: Using default fetchSize of 1000
16/10/13 13:17:38 INFO tool.CodeGenTool: Beginning code generation
16/10/13 13:17:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [UserMessage] AS t WHERE 1=0
16/10/13 13:17:38 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.3.0-227/hadoop-mapreduce
Note: /tmp/sqoop-sherry/compile/c809ee201c0aec1edf2ed5a1ef4aed4c/UserMessage.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/10/13 13:17:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-sherry/compile/c809ee201c0aec1edf2ed5a1ef4aed4c/UserMessage.jar
16/10/13 13:17:39 INFO mapreduce.ImportJobBase: Beginning import of UserMessage
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-227/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-227/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/10/13 13:17:40 INFO impl.TimelineClientImpl: Timeline service address: http://machine-02-xx:8188/ws/v1/timeline/
16/10/13 13:17:40 INFO client.RMProxy: Connecting to ResourceManager at machine-02-xx/xxx.xx.xx.xx:8050
16/10/13 13:17:42 INFO db.DBInputFormat: Using read commited transaction isolation
16/10/13 13:17:42 INFO mapreduce.JobSubmitter: number of splits:1
16/10/13 13:17:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1475746531098_0317
16/10/13 13:17:43 INFO impl.YarnClientImpl: Submitted application application_1475746531098_0317
16/10/13 13:17:43 INFO mapreduce.Job: The url to track the job: http://machine-02-xx:8088/proxy/application_1475746531098_0317/
16/10/13 13:17:43 INFO mapreduce.Job: Running job: job_1475746531098_0317
16/10/13 13:17:48 INFO mapreduce.Job: Job job_1475746531098_0317 running in uber mode : false
16/10/13 13:17:48 INFO mapreduce.Job: map 0% reduce 0%
16/10/13 13:17:52 INFO mapreduce.Job: map 100% reduce 0%
16/10/13 13:17:52 INFO mapreduce.Job: Job job_1475746531098_0317 completed successfully
16/10/13 13:17:52 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=156179
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=0
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=3486
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=1743
Total vcore-seconds taken by all map tasks=1743
Total megabyte-seconds taken by all map tasks=2677248
Map-Reduce Framework
Map input records=0
Map output records=0
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=30
CPU time spent (ms)=980
Physical memory (bytes) snapshot=233308160
Virtual memory (bytes) snapshot=3031945216
Total committed heap usage (bytes)=180879360
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
16/10/13 13:17:52 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 12.6069 seconds (0 bytes/sec)
16/10/13 13:17:52 INFO mapreduce.ImportJobBase: Retrieved 0 records.
16/10/13 13:17:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [UserMessage] AS t WHERE 1=0
16/10/13 13:17:52 WARN hive.TableDefWriter: Column SendDate had to be cast to a less precise type in Hive
16/10/13 13:17:52 INFO hive.HiveImport: Loading uploaded data into Hive
Logging initialized using configuration in jar:file:/usr/hdp/2.4.3.0-227/hive/lib/hive-common-1.2.1000.2.4.3.0-227.jar!/hive-log4j.properties
OK
Time taken: 1.286 seconds
Loading data to table sqlcmc.usermessage
Table sqlcmc.usermessage stats: [numFiles=1, totalSize=0]
OK
Time taken: 0.881 seconds
Note: /tmp/sqoop-sherry/compile/c809ee201c0aec1edf2ed5a1ef4aed4c/DadChMasConDig.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Logging initialized using configuration in jar:file:/usr/hdp/2.4.3.0-227/hive/lib/hive-common-1.2.1000.2.4.3.0-227.jar!/hive-log4j.properties
OK
【问题讨论】:
将--verbose
(检查扩展日志)放入你的命令中,检查是否有任何错误/异常
是的,我也用 --verbose 尝试过,但没有显示任何异常或错误。
试试sqoop list-tables --connect "jdbc:sqlserver://ip.ip.ip.ip\MIGERATIONSERVER;port=1433;username=sa;password=blablaq;database=sqlserverdb"
。它显示了所有 100 个表吗?
是的,它显示了所有表格。但我看到它显示了所有表的列表,我意识到它一直只导入前 6 个表(从我现在看到的列表中)
mapreduce 是否有问题,因为它 sqoop 在后端使用 MR,我必须传递特定参数才能将其打开以导入所有表。
【参考方案1】:
首先import-all-tables
将为所有表运行导入表。
如果您没有定义作业中的映射器数量,Sqoop 将默认选择 4 个映射器。所以,它需要表有主键或者你指定--split-by
列名。
如果是这种情况,您将看到如下错误:
ERROR tool.ImportAllTablesTool:导入时出错:找不到表测试的主键。请使用 --split-by 指定一个或使用 '-m 1' 执行顺序导入。
因此您可以使用 1 个映射器,这会使您的导入过程变慢。
更好的方法是添加--autoreset-to-one-mapper
,它将使用命令中提到的映射器数量导入具有主键的表,并且它将自动为没有主键的表使用1个映射器。
来解决你的问题,
表 DadChMasConDig
的 sqoop 导入失败。
不知道为什么没有登录控制台。
在导入此表时可能会出现异常
运行导入作业时遇到 IOException: java.io.IOException: Hive 不支持列
<somecolumn>
的 SQL 类型
例如,varbinary
不受支持。
如果您只在 HDFS 中导入数据,那应该没有问题。你可以试试:
sqoop-import-all-tables --connect "jdbc:sqlserver://ip.ip.ip.ip\MIGERATIONSERVER;port=1433;username=sa;password=blablaq;database=sqlserverdb"
【讨论】:
【参考方案2】:我遇到了同样的问题,以下对我有用。虽然通常 --create-hive-table 和 --hive-overwrite 不会一起使用并且一起没有意义。但是没有其他组合有效,每次只有 10 个表中的 3 个或一小部分表被导入
sqoop import-all-tables \
--connect jdbc:mysql://<mysql-url>/my_database \
--username sql_user \
--password sql_pwd \
--hive-import \
--hive-database test_hive \
--hive-overwrite \
--create-hive-table \
--warehouse-dir /apps/hive/warehouse/test_hive.db \
-m 1
【讨论】:
以上是关于Sqoop "import-all-tables" 无法导入所有表的主要内容,如果未能解决你的问题,请参考以下文章
sqoop报错java.lang.Throwable Message: ERROR: schema "jice" does not exist