Sqoop1和sqoop2导入导出

Posted Gorun2017

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Sqoop1和sqoop2导入导出相关的知识,希望对你有一定的参考价值。

sqoop1

1、使用Sqoop导入mysql数据到HDFS

[[email protected] ~]# sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns ‘uid,uname‘ -m 1 -target-dir ‘/sqoop/user‘; //-m 指定map进程数,-target-dir指定存放目录

 

2、使用Sqoop导入MySQL数据到Hive中

[[email protected] ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns ‘uid,uname‘ -m 1 

 

3、使用Sqoop导入MySQL数据到Hive中,并且指定表名

[[email protected] ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns ‘uid,uname‘ -m 1 --hive-table user1;    //如果hive中没有这张表,则创建这张表保存对应数据

 

4、使用Sqoop导入MySQL数据到Hive中,并使用where条件

[[email protected] ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns ‘uid,uname‘ -m 1 --hive-table user2 where uid=10;

 

5、使用Sqoop导入MySQL数据到Hive中,并使用查询语句

[[email protected] ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root -m 1 --hive-table user6 --query ‘select * from user where uid<10 and $conditions‘ --target-dir /sqoop/user5;
//and $conditions 必须加在查询语句中,不加报错

 

6、使用Sqoop将Hive中的数据导出到MySQL中

[[email protected] ~]# sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root -m 1 --table user5 --export-dir /sqoop/user5  //两张表的列的个数和类型必须相同

 

 

sqoop2

sqoop-shell

启动sqoop-shell

 1 jjzhu:bin didi$ sqoop2-shell 
 2 Setting conf dir: /opt/sqoop-1.99.7/bin/../conf
 3 Sqoop home directory: /opt/sqoop-1.99.7
 4 Sqoop Shell: Type help or \h for help.
 5 
 6 sqoop:000> set server --host localhost --port 12000 --webapp sqoop
 7 Server is set successfully
 8 sqoop:000> show version --all
 9 client version:
10   Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb 
11   Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016
12 0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13 server version:
14   Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb 
15   Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016
16 API versions:
17   [v1]
18 sqoop:000>

配置sqoop server

sqoop:000> set server --host localhost --port 12000 --webapp sqoop
Server is set successfully

查看server连接是否可用

sqoop:000> show version --all
client version:
  Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb 
  Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016
0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
server version:
  Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb 
  Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016
API versions:
  [v1]
sqoop:000>

 

创建链接

查看sqoop server上可用的链接

 1 sqoop:000> show connector
 2 +------------------------+---------+------------------------------------------------------------+----------------------+
 3 |          Name          | Version |                           Class                            | Supported Directions |
 4 +------------------------+---------+------------------------------------------------------------+----------------------+
 5 | generic-jdbc-connector | 1.99.7  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector       | FROM/TO              |
 6 | kite-connector         | 1.99.7  | org.apache.sqoop.connector.kite.KiteConnector              | FROM/TO              |
 7 | oracle-jdbc-connector  | 1.99.7  | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO              |
 8 | ftp-connector          | 1.99.7  | org.apache.sqoop.connector.ftp.FtpConnector                | TO                   |
 9 | hdfs-connector         | 1.99.7  | org.apache.sqoop.connector.hdfs.HdfsConnector              | FROM/TO              |
10 | kafka-connector        | 1.99.7  | org.apache.sqoop.connector.kafka.KafkaConnector            | TO                   |
11 | sftp-connector         | 1.99.7  | org.apache.sqoop.connector.sftp.SftpConnector              | TO                   |
12 +------------------------+---------+------------------------------------------------------------+----------------------+
13 sqoop:000>
  • generic-jdbc-connector
    依赖于java JDBC的connector,可以作为数据导入的数据源和目标源
  • hdfs-connector
    以hdfs作为数据源或者目标源的connector

用如下命令创建一个generic-jdbc-connector的链接

 1 sqoop:002> create link -c generic-jdbc-connector
 2 Creating link for connector with name generic-jdbc-connector
 3 Please fill following values to create new link object
 4 Name: mysql_weibouser_link
 5 
 6 Database connection
 7 
 8 Driver class: com.mysql.jdbc.Driver
 9 Connection String: jdbc:mysql://127.0.0.1:3306/spider
10 Username: root
11 Password: ****
12 Fetch Size: 
13 Connection Properties: 
14 There are currently 0 values in the map:
15 entry# protocol=tcp
16 There are currently 1 values in the map:
17 protocol = tcp
18 entry# 
19 
20 SQL Dialect
21 
22 Identifier enclose:  **注意  这里不能直接回车!要打一个空格符号!因为如果不打,查询mysql表的时候会在表上加上“”,导致查询出错!
23 **
24 New link was successfully created with validation status OK and name mysql_weibouser_link

创建hdfs link

 1 sqoop:002> create link -c hdfs-connector
 2 Creating link for connector with name hdfs-connector
 3 Please fill following values to create new link object
 4 Name: hdfs_weibouser_link
 5 
 6 HDFS cluster
 7 
 8 URI: hdfs://localhost:9000
 9 Conf directory: /opt/hadoop-2.7.3/etc/hadoop
10 Additional configs:: 
11 There are currently 0 values in the map:
12 entry# 
13 New link was successfully created with validation status OK and name hdfs_weibouser_link

查看link

 1 sqoop:002> show link
 2 +----------------------+------------------------+---------+
 3 |         Name         |     Connector Name     | Enabled |
 4 +----------------------+------------------------+---------+
 5 | mysql_weibouser      | generic-jdbc-connector | true    |
 6 | mysql_weibouser_link | generic-jdbc-connector | true    |
 7 | hdfs_link            | hdfs-connector         | true    |
 8 | hdfs_link2           | hdfs-connector         | true    |
 9 | hdfs_weibouser_link  | hdfs-connector         | true    |
10 +----------------------+------------------------+---------+

创建job

技术分享图片
 1 sqoop:002> create job -f "mysql_weibouser_link" -t "hdfs_weibouser_link"
 2 Creating job for links with from name mysql_weibouser_link and to name hdfs_weibouser_link
 3 Please fill following values to create new job object
 4 Name: job_weibouser
 5 
 6 Database source
 7 
 8 Schema name: spider
 9 Table name: spiders_weibouser
10 SQL statement: 
11 Column names: 
12 There are currently 0 values in the list:
13 element# 
14 Partition column: 
15 Partition column nullable: 
16 Boundary query: 
17 
18 Incremental read
19 
20 Check column: 
21 Last value: 
22 
23 Target configuration
24 
25 Override null value: 
26 Null value: 
27 File format: 
28   0 : TEXT_FILE
29   1 : SEQUENCE_FILE
30   2 : PARQUET_FILE
31 Choose: 0
32 Compression codec: 
33   0 : NONE
34   1 : DEFAULT
35   2 : DEFLATE
36   3 : GZIP
37   4 : BZIP2
38   5 : LZO
39   6 : LZ4
40   7 : SNAPPY
41   8 : CUSTOM
42 Choose: 0
43 Custom codec: 
44 Output directory: hdfs://localhost:9000/usr/jjzhu/spider/spiders_weibouser
45 Append mode: 
46 
47 Throttling resources
48 
49 Extractors: 2
50 Loaders: 2
51 
52 Classpath configuration
53 
54 Extra mapper jars: 
55 There are currently 0 values in the list:
56 element# 
57 New job was successfully created with validation status OK  and name job_weibouser
View Code

 

各参数意义:

技术分享图片
 1 以下是各个属性
 2 Name:一个标示符,自己指定即可。
 3 Schema Name:指定Database或Schema的名字,在MySQL中,Schema同Database类似,具体什么区别没有深究过,但官网描述在创建时差不多。。
 4 Table Name:自己指定导出的表。
 5 SQL Statement:就是sql查询语句,文档上说需要指定一个$condition,但我一直没有创建成功,貌似是一个条件子句。
 6 配置完以上几项,又回出现element#提示符,提示输入一些hash值,直接回车过。
 7 Partition column:
 8 Partition column nullable:
 9 Boundary query
10 Last value
11 后面需要配置数据目的地各项值:
12 Null alue:大概说的是如果有空值用什么覆盖
13 File format:指定在HDFS中的数据文件是什么文件格式,这里使用TEXT_FILE,即最简单的文本文件。
14 Compression codec:用于指定使用什么压缩算法进行导出数据文件压缩,我指定NONE,这个也可以使用自定义的压缩算法CUSTOM,用Java实现相应的接口。
15 Custom codec:这个就是指定的custom压缩算法,本例选择NONE,所以直接回车过去。
16 Output directory:指定存储在HDFS文件系统中的路径,这里最好指定一个存在的路径,或者存在但路劲下是空的,貌似这样才能成功。
17 Append mode:用于指定是否是在已存在导出文件的情况下将新数据追加到数据文件中。
18 Extractors:2
19 Loaders:2
20 最后再次出现element#提示符,用于输入extra mapper jars的属性,可以什么都不写。直接回车。
21 
22 至此若出现successful则证明已经成功创建。
View Code

查看创建的job

技术分享图片
1 sqoop:002> show job
2 +----+---------------+-----------------------------------------------+--------------------------------------+---------+
3 | Id |     Name      |                From Connector                 |             To Connector             | Enabled |
4 +----+---------------+-----------------------------------------------+--------------------------------------+---------+
5 | 1  | spider_job    | mysql_weibouser (generic-jdbc-connector)      | hdfs_link (hdfs-connector)           | true    |
6 | 2  | job_weibouser | mysql_weibouser_link (generic-jdbc-connector) | hdfs_weibouser_link (hdfs-connector) | true    |
7 +----+---------------+-----------------------------------------------+--------------------------------------+---------+
8 sqoop:002> 
View Code

启动job

技术分享图片
 1 start job -n job_weibouser
 2 sqoop:002> start job -n job_weibouser
 3 Submission details
 4 Job Name: job_weibouser
 5 Server URL: http://localhost:12000/sqoop/
 6 Created by: didi
 7 Creation date: 2017-04-11 14:37:46 CST
 8 Lastly updated by: didi
 9 External ID: job_1491888730134_0003
10     http://jjzhu:8088/proxy/application_1491888730134_0003/
11 2017-04-11 14:37:46 CST: BOOTING  - Progress is not available
View Code

查看job运行状态

 1 sqoop:002> status job -n job_weibouser
 2 Submission details
 3 Job Name: job_weibouser
 4 Server URL: http://localhost:12000/sqoop/
 5 Created by: didi
 6 Creation date: 2017-04-11 14:37:46 CST
 7 Lastly updated by: didi
 8 External ID: job_1491888730134_0003
 9     http://jjzhu:8088/proxy/application_1491888730134_0003/
10 2017-04-11 14:38:41 CST: SUCCEEDED 
11 Counters:
12     org.apache.hadoop.mapreduce.FileSystemCounter
13         FILE_LARGE_READ_OPS: 0
14         FILE_WRITE_OPS: 0
15         HDFS_READ_OPS: 2
16         HDFS_BYTES_READ: 290
17         HDFS_LARGE_READ_OPS: 0
18         FILE_READ_OPS: 0
19         FILE_BYTES_WRITTEN: 51361466
20         FILE_BYTES_READ: 25115854
21         HDFS_WRITE_OPS: 2
22         HDFS_BYTES_WRITTEN: 24652721
23     org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
24         BYTES_WRITTEN: 0
25     org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
26         BYTES_READ: 0
27     org.apache.hadoop.mapreduce.JobCounter
28         TOTAL_LAUNCHED_MAPS: 2
29         VCORES_MILLIS_REDUCES: 20225
30         MB_MILLIS_MAPS: 27120640
31         TOTAL_LAUNCHED_REDUCES: 2
32         SLOTS_MILLIS_REDUCES: 20225
33         VCORES_MILLIS_MAPS: 26485
34         MB_MILLIS_REDUCES: 20710400
35         SLOTS_MILLIS_MAPS: 26485
36         MILLIS_REDUCES: 20225
37         OTHER_LOCAL_MAPS: 2
38         MILLIS_MAPS: 26485
39     org.apache.sqoop.submission.counter.SqoopCounters
40         ROWS_READ: 109408
41         ROWS_WRITTEN: 109408
42     org.apache.hadoop.mapreduce.TaskCounter
43         MAP_OUTPUT_MATERIALIZED_BYTES: 25115866
44         REDUCE_INPUT_RECORDS: 109408
45         SPILLED_RECORDS: 218816
46         MERGED_MAP_OUTPUTS: 4
47         VIRTUAL_MEMORY_BYTES: 0
48         MAP_INPUT_RECORDS: 0
49         SPLIT_RAW_BYTES: 290
50         FAILED_SHUFFLE: 0
51         MAP_OUTPUT_BYTES: 24762129
52         REDUCE_SHUFFLE_BYTES: 25115866
53         PHYSICAL_MEMORY_BYTES: 0
54         GC_TIME_MILLIS: 1648
55         REDUCE_INPUT_GROUPS: 109408
56         COMBINE_OUTPUT_RECORDS: 0
57         SHUFFLED_MAPS: 4
58         REDUCE_OUTPUT_RECORDS: 109408
59         MAP_OUTPUT_RECORDS: 109408
60         COMBINE_INPUT_RECORDS: 0
61         CPU_MILLISECONDS: 0
62         COMMITTED_HEAP_BYTES: 1951399936
63     Shuffle Errors
64         CONNECTION: 0
65         WRONG_LENGTH: 0
66         BAD_ID: 0
67         WRONG_MAP: 0
68         WRONG_REDUCE: 0
69         IO_ERROR: 0
70 Job executed successfully

查看hdfs的相关路径,看是否有输出文件

 1 jjzhu:~ didi$ hdfs dfs -ls /usr/jjzhu/spider
 2 Found 4 items
 3 drwxr-xr-x   - didi supergroup          0 2017-04-11 14:38 /usr/jjzhu/spider/spiders_weibouser
 4 drwxr-xr-x   - 777  supergroup          0 2017-04-11 10:58 /usr/jjzhu/spider/weibouser
 5 drwxr-xr-x   - 777  supergroup          0 2017-04-11 13:33 /usr/jjzhu/spider/weobouser
 6 drwxr-xr-x   - didi supergroup          0 2017-04-11 13:39 /usr/jjzhu/spider/weobouser2
 7 jjzhu:~ didi$ hdfs dfs -ls /usr/jjzhu/spider/spiders_weibouser
 8 Found 2 items
 9 -rw-r--r--   1 didi supergroup   12262783 2017-04-11 14:38 /usr/jjzhu/spider/spiders_weibouser/33b56441-b638-48cc-8d0d-37a808f25653.txt
10 -rw-r--r--   1 didi supergroup   12389938 2017-04-11 14:38 /usr/jjzhu/spider/spiders_weibouser/73b20d50-de72-4aea-8c8c-d97cdc48e667.txt

 

 

转自:https://yq.aliyun.com/articles/73582

 



以上是关于Sqoop1和sqoop2导入导出的主要内容,如果未能解决你的问题,请参考以下文章

sqoop导入导出

sqoop数据迁移

sqoop数据迁移

sqoop导入导出常见问题

sqoop1.99.3操作,导入数据全纪录

20180611早课记录28-Sqoop