datax oracle到mysql数据抽取
Posted 阿西吧li
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了datax oracle到mysql数据抽取相关的知识,希望对你有一定的参考价值。
环境配置(官方推荐):
JDK(1.8以上,推荐1.8)
Python(推荐Python2.6.X)
下载datax工具:http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
解压后就能使用。
目录结构如下:
[root@rancher1 datax]# pwd /datax [root@rancher1 datax]# ls -lh total 4.0K drwxr-xr-x 2 62265 users 59 Jul 8 17:42 bin drwxr-xr-x 2 62265 users 68 Oct 12 2019 conf drwxr-xr-x 2 62265 users 85 Jul 8 18:39 job drwxr-xr-x 2 62265 users 4.0K Oct 12 2019 lib drwxr-xr-x 3 root root 24 Jul 8 17:22 log drwxr-xr-x 3 root root 24 Jul 8 17:22 log_perf drwxr-xr-x 4 62265 users 34 Oct 12 2019 plugin drwxr-xr-x 2 62265 users 23 Oct 12 2019 script drwxr-xr-x 2 62265 users 24 Oct 12 2019 tmp
ORACLE服务器信息:
监听信息:
[oracle@exam ~]$ lsnrctl status LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 08-JUL-2020 19:45:54 Copyright (c) 1991, 2013, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=exam)(PORT=1521))) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for Linux: Version 11.2.0.4.0 - Production Start Date 08-JUL-2020 16:51:50 Uptime 0 days 2 hr. 54 min. 4 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/oracle/product/11.2.0/db_1/network/admin/listener.ora Listener Log File /u01/app/oracle/diag/tnslsnr/exam/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=exam)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521))) Services Summary... Service "exam" has 1 instance(s). Instance "exam", status READY, has 1 handler(s) for this service... Service "examXDB" has 1 instance(s). Instance "exam", status READY, has 1 handler(s) for this service... The command completed successfully
表结构:
SQL> desc issue.db_hosts; Name Null? Type ----------------------------------------- -------- ---------------------------- DB_ID NUMBER(16) HOST_NAME VARCHAR2(256 CHAR) IP VARCHAR2(30 CHAR) RAC_ID NUMBER(16)
mysql服务器信息:
mysql 2569 1276 0 16:47 ? 00:00:11 /u01/mysql-5.7.27/bin/mysqld --basedir=/u01/mysql-5.7.27 --datadir=/u01/mysql-5.7.27/data --plugin-dir=/u01/mysql-5.7.27/lib/plugin --user=mysql --log-error=/u01/mysql-5.7.27/log/mysql_error.log --open-files-limit=65535 --pid-file=/u01/mysql-5.7.27/mysql.pid --socket=/u01/mysql-5.7.27/mysql.sock --port=3306
mysql> desc issue.db_hosts; +-----------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------+-------------+------+-----+---------+-------+ | db_id | bigint(20) | YES | | NULL | | | host_name | varchar(50) | YES | | NULL | | | ip | varchar(20) | YES | | NULL | | | rac_id | int(11) | YES | | NULL | | +-----------+-------------+------+-----+---------+-------+ 4 rows in set (0.00 sec)
配置数据抽取的json文件:
[root@rancher1 job]# cat db_host.json { "job": { "content": [ { "reader": { "name": "oraclereader", "parameter": { "username": "issue", "password": "issue", "column": [ "*" ], "connection": [ { "table": [ "db_hosts" ], "jdbcUrl": [ "jdbc:oracle:thin:@192.168.0.210:1521:exam" ] } ] } }, "writer": { "name": "mysqlwriter", "parameter": { "writeMode": "insert", "username": "root", "password": "123456", "column": [ "db_id","host_name","ip","rac_id" ], "preSql": [ "truncate table db_hosts" ], "connection": [ { "jdbcUrl": "jdbc:mysql://192.168.56.140:3306/issue?useUnicode=true&characterEncoding=utf8", "table": [ "db_hosts" ] } ] } } } ], "setting": { "speed": { "channel": 5 } } } }
执行抽取:
[root@rancher1 bin]# python datax.py ../job/db_host.json DataX (DATAX-OPENSOURCE-3.0), From Alibaba ! Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved. 2020-07-08 19:56:11.019 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl 2020-07-08 19:56:11.029 [main] INFO Engine - the machine info => osInfo: Oracle Corporation 1.8 25.161-b14 jvmInfo: Linux amd64 5.5.10-1.el7.elrepo.x86_64 cpu num: 2 totalPhysicalMemory: -0.00G freePhysicalMemory: -0.00G maxFileDescriptorCount: -1 currentOpenFileDescriptorCount: -1 GC Names [PS MarkSweep, PS Scavenge] MEMORY_NAME | allocation_size | init_size PS Eden Space | 256.00MB | 256.00MB Code Cache | 240.00MB | 2.44MB Compressed Class Space | 1,024.00MB | 0.00MB PS Survivor Space | 42.50MB | 42.50MB PS Old Gen | 683.00MB | 683.00MB Metaspace | -0.00MB | 0.00MB 2020-07-08 19:56:11.054 [main] INFO Engine - { "content":[ { "reader":{ "name":"oraclereader", "parameter":{ "column":[ "*" ], "connection":[ { "jdbcUrl":[ "jdbc:oracle:thin:@192.168.0.210:1521:exam" ], "table":[ "db_hosts" ] } ], "password":"*****", "username":"issue" } }, "writer":{ "name":"mysqlwriter", "parameter":{ "column":[ "db_id", "host_name", "ip", "rac_id" ], "connection":[ { "jdbcUrl":"jdbc:mysql://192.168.56.140:3306/issue?useUnicode=true&characterEncoding=utf8", "table":[ "db_hosts" ] } ], "password":"******", "preSql":[ "truncate table db_hosts" ], "username":"root", "writeMode":"insert" } } } ], "setting":{ "speed":{ "channel":5 } } } 2020-07-08 19:56:11.080 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null 2020-07-08 19:56:11.083 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0 2020-07-08 19:56:11.083 [main] INFO JobContainer - DataX jobContainer starts job. 2020-07-08 19:56:11.086 [main] INFO JobContainer - Set jobId = 0 2020-07-08 19:56:11.498 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:oracle:thin:@192.168.0.210:1521:exam. 2020-07-08 19:56:11.500 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改. 2020-07-08 19:56:11.816 [job-0] INFO OriginalConfPretreatmentUtil - table:[db_hosts] all columns:[ db_id,host_name,ip,rac_id ]. 2020-07-08 19:56:11.833 [job-0] INFO OriginalConfPretreatmentUtil - Write data [ insert INTO %s (db_id,host_name,ip,rac_id) VALUES(?,?,?,?) ], which jdbcUrl like:[jdbc:mysql://192.168.56.140:3306/issue?useUnicode=true&characterEncoding=utf8&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true] 2020-07-08 19:56:11.833 [job-0] INFO JobContainer - jobContainer starts to do prepare ... 2020-07-08 19:56:11.834 [job-0] INFO JobContainer - DataX Reader.Job [oraclereader] do prepare work . 2020-07-08 19:56:11.835 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do prepare work . 2020-07-08 19:56:11.847 [job-0] INFO CommonRdbmsWriter$Job - Begin to execute preSqls:[truncate table db_hosts]. context info:jdbc:mysql://192.168.56.140:3306/issue?useUnicode=true&characterEncoding=utf8&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true. 2020-07-08 19:56:11.857 [job-0] INFO JobContainer - jobContainer starts to do split ... 2020-07-08 19:56:11.858 [job-0] INFO JobContainer - Job set Channel-Number to 5 channels. 2020-07-08 19:56:11.865 [job-0] INFO JobContainer - DataX Reader.Job [oraclereader] splits to [1] tasks. 2020-07-08 19:56:11.865 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks. 2020-07-08 19:56:11.887 [job-0] INFO JobContainer - jobContainer starts to do schedule ... 2020-07-08 19:56:11.891 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups. 2020-07-08 19:56:11.895 [job-0] INFO JobContainer - Running by standalone Mode. 2020-07-08 19:56:11.909 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks. 2020-07-08 19:56:11.923 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated. 2020-07-08 19:56:11.923 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated. 2020-07-08 19:56:11.939 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started 2020-07-08 19:56:11.947 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select * from db_hosts ] jdbcUrl:[jdbc:oracle:thin:@192.168.0.210:1521:exam]. 2020-07-08 19:56:12.117 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select * from db_hosts ] jdbcUrl:[jdbc:oracle:thin:@192.168.0.210:1521:exam]. 2020-07-08 19:56:12.376 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[442]ms 2020-07-08 19:56:12.377 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it‘s tasks. 2020-07-08 19:56:21.924 [job-0] INFO StandAloneJobContainerCommunicator - Total 667 records, 21266 bytes | Speed 2.08KB/s, 66 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.003s | All Task WaitReaderTime 0.140s | Percentage 100.00% 2020-07-08 19:56:21.925 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks. 2020-07-08 19:56:21.925 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do post work. 2020-07-08 19:56:21.925 [job-0] INFO JobContainer - DataX Reader.Job [oraclereader] do post work. 2020-07-08 19:56:21.926 [job-0] INFO JobContainer - DataX jobId [0] completed successfully. 2020-07-08 19:56:21.927 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /datax/hook 2020-07-08 19:56:21.929 [job-0] INFO JobContainer - [total cpu info] => averageCpu | maxDeltaCpu | minDeltaCpu -1.00% | -1.00% | -1.00% [total gc info] => NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime PS MarkSweep | 1 | 1 | 1 | 0.054s | 0.054s | 0.054s PS Scavenge | 1 | 1 | 1 | 0.032s | 0.032s | 0.032s 2020-07-08 19:56:21.930 [job-0] INFO JobContainer - PerfTrace not enable! 2020-07-08 19:56:21.931 [job-0] INFO StandAloneJobContainerCommunicator - Total 667 records, 21266 bytes | Speed 2.08KB/s, 66 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.003s | All Task WaitReaderTime 0.140s | Percentage 100.00% 2020-07-08 19:56:21.932 [job-0] INFO JobContainer - 任务启动时刻 : 2020-07-08 19:56:11 任务结束时刻 : 2020-07-08 19:56:21 任务总计耗时 : 10s 任务平均流量 : 2.08KB/s 记录写入速度 : 66rec/s 读出记录总数 : 667 读写失败总数 : 0 [root@rancher1 bin]#
查看结果:
mysql> select * from issue.db_hosts; +------------+------------------------+----------------+--------+ | db_id | host_name | ip | rac_id | +------------+------------------------+----------------+--------+ | 500011196 | dxxxxxxxxx3 | 10.xxx.xxx.174 | 28 | | 500014437 | dxxxx | 10.xxx.xxx.30 | 10 | | 500014437 | dxxxx4 | 10.1xx.xxx.33 | 10 | | 500014437 | dxxxx | 10.xxx.xxx.39 | 10 | | 500014437 | dbxx | 10.xxx.xxx.38 | 10 | | 500014437 | dbxx | 10.xxx.xxx.37 | 10 |
感觉配置还是比较简单的,抽取大量数据库没有测试不晓得性能咋样;
大批量表数据迁移时候需要些脚本生成对应的json(配置json文件很容易出错),具体每个类型数据库的json写法参考https://github.com/alibaba/DataX
oracleread:https://github.com/alibaba/DataX/blob/master/oraclereader/doc
mysqlwrite:https://github.com/alibaba/DataX/tree/master/mysqlwriter/doc
表结构可以借助powerdesign工具实行转换,个别表单独修改;
以上是关于datax oracle到mysql数据抽取的主要内容,如果未能解决你的问题,请参考以下文章
使用dataX将数据从Mysql数据库同步到Oracle数据
数据处理 大数据数据抽取 ETL 工具 DataX KettleSqoop
数据处理 大数据数据抽取 ETL 工具 DataX KettleSqoop