无法使用 Sqoop 将数据从 Vertica 导入 Cassandra

Posted

技术标签:

【中文标题】无法使用 Sqoop 将数据从 Vertica 导入 Cassandra【英文标题】:Unable to import data from Vertica to Cassandra using Sqoop 【发布时间】:2014-10-30 14:53:49 【问题描述】:

我正在尝试使用 Sqoop 将表从 Vertica 导入 DataStax Enterprise 4.5。没有报错也没有异常,但是目标表中没有数据。

这是我所做的:

在 Cqlsh 中创建键空间和表:

CREATE KEYSPACE IF NOT EXISTS npa_nxx WITH replication =  
    'class': 'SimpleStrategy', 'replication_factor': '1' ;

CREATE TABLE npa_nxx.npa_nxx_data (
    region varchar, market varchar,
PRIMARY KEY(market));

创建一个选项表:

cql-import
--table
dim_location
--cassandra-keyspace
npa_nxx 
--cassandra-table
npa_nxx_data
--cassandra-column-mapping
region:region,market:market
--connect
jdbc:vertica://xx.xxx.xx.xxx:5433/schema
--driver
com.vertica.jdbc.Driver
--username
xxxxx
--password
xxx
--cassandra-host
xx.xxx.xx.xxx

然后执行sqoop命令:

dse sqoop --options-file /usr/share/dse/demos/sqoop/import.options

这是完整的输出:

14/10/30 09:28:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/10/30 09:28:53 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
14/10/30 09:28:53 INFO manager.SqlManager: Using default fetchSize of 1000
14/10/30 09:28:53 INFO tool.CodeGenTool: Beginning code generation
14/10/30 09:28:54 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM dim_location AS t WHERE 1=0
14/10/30 09:28:54 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM dim_location AS t WHERE 1=0
14/10/30 09:28:54 INFO orm.CompilationManager: $HADOOP_MAPRED_HOME is not set
Note: /tmp/sqoop-root/compile/159b8e57e91397f8c48f4455f6da0e5a/dim_location.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/10/30 09:28:55 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/159b8e57e91397f8c48f4455f6da0e5a/dim_location.jar
14/10/30 09:28:55 INFO mapreduce.ImportJobBase: Beginning import of dim_location
14/10/30 09:28:56 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM dim_location AS t WHERE 1=0
14/10/30 09:28:56 INFO snitch.Workload: Setting my workload to Cassandra
14/10/30 09:28:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/10/30 09:28:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(MARKET), MAX(MARKET) FROM dim_location
14/10/30 09:28:59 WARN db.TextSplitter: Generating splits for a textual index column.
14/10/30 09:28:59 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
14/10/30 09:28:59 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
14/10/30 09:29:00 INFO mapred.JobClient: Running job: job_201410291321_0012
14/10/30 09:29:01 INFO mapred.JobClient:  map 0% reduce 0%
14/10/30 09:29:18 INFO mapred.JobClient:  map 20% reduce 0%
14/10/30 09:29:22 INFO mapred.JobClient:  map 40% reduce 0%
14/10/30 09:29:25 INFO mapred.JobClient:  map 60% reduce 0%
14/10/30 09:29:28 INFO mapred.JobClient:  map 80% reduce 0%
14/10/30 09:29:31 INFO mapred.JobClient:  map 100% reduce 0%
14/10/30 09:29:34 INFO mapred.JobClient: Job complete: job_201410291321_0012
14/10/30 09:29:34 INFO mapred.JobClient: Counters: 18
14/10/30 09:29:34 INFO mapred.JobClient:   Job Counters
14/10/30 09:29:34 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=29652
14/10/30 09:29:34 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/10/30 09:29:34 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/10/30 09:29:34 INFO mapred.JobClient:     Launched map tasks=5
14/10/30 09:29:34 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
14/10/30 09:29:34 INFO mapred.JobClient:   File Output Format Counters
14/10/30 09:29:34 INFO mapred.JobClient:     Bytes Written=2003
14/10/30 09:29:34 INFO mapred.JobClient:   FileSystemCounters
14/10/30 09:29:34 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=130485
14/10/30 09:29:34 INFO mapred.JobClient:     CFS_BYTES_WRITTEN=2003
14/10/30 09:29:34 INFO mapred.JobClient:     CFS_BYTES_READ=664
14/10/30 09:29:34 INFO mapred.JobClient:   File Input Format Counters
14/10/30 09:29:34 INFO mapred.JobClient:     Bytes Read=0
14/10/30 09:29:34 INFO mapred.JobClient:   Map-Reduce Framework
14/10/30 09:29:34 INFO mapred.JobClient:     Map input records=98
14/10/30 09:29:34 INFO mapred.JobClient:     Physical memory (bytes) snapshot=985702400
14/10/30 09:29:34 INFO mapred.JobClient:     Spilled Records=0
14/10/30 09:29:34 INFO mapred.JobClient:     CPU time spent (ms)=1260
14/10/30 09:29:34 INFO mapred.JobClient:     Total committed heap usage (bytes)=1249378304
14/10/30 09:29:34 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=8317739008
14/10/30 09:29:34 INFO mapred.JobClient:     Map output records=98
14/10/30 09:29:34 INFO mapred.JobClient:     SPLIT_RAW_BYTES=664
14/10/30 09:29:34 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 38.8727 seconds (0 bytes/sec)
14/10/30 09:29:34 INFO mapreduce.ImportJobBase: Retrieved 98 records.

有人对这里发生的事情有想法吗?谢谢!

【问题讨论】:

从 sql server 到 cassandra 使用 DSE 有同样的问题 - 很想知道是什么原因造成的,以及我应该删除的目录在哪里.. dim_location 表的结构是什么? 【参考方案1】:

运行以下命令以了解您的文件在 CFS 上的位置:

dse hadoop fs -ls <location given in target directory>

【讨论】:

以上是关于无法使用 Sqoop 将数据从 Vertica 导入 Cassandra的主要内容,如果未能解决你的问题,请参考以下文章

使用 Hue sqoop 2 从 vertica 获取数据

sqoop从oracle导数据后是空表

Sqoop从hive导数据到mysql中为啥一直报 ERROR tool.ExportTool:

怎么用sqoop增量从hive往oracle数据库导数据

sqoop用法之mysql与hive数据导入导出#yyds干货盘点#

Sqoop从入门到实战