datax的mysql2hdfs文件系统高可用配置教程

Posted 闭关苦炼内功

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了datax的mysql2hdfs文件系统高可用配置教程相关的知识,希望对你有一定的参考价值。

datax的mysql2hdfs文件系统高可用配置文档

关键参数配置信息:

  • hdfs-site.xml,core-site.xml
"hadoopConfig":
                "dfs.client.failover.proxy.provider.bcluster": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
                "dfs.ha.namenodes.bcluster": "nn1,nn2",
                "dfs.namenode.rpc-address.bcluster.nn1": "bigdata02:8020",
                "dfs.namenode.rpc-address.bcluster.nn2": "bigdata03:8020",
                "dfs.nameservices": "bcluster"
               ,
"defaultFS": "hdfs://bcluster",

详细配置如下:

[hdfs@demo01 ~]$ cat mysql2hive_ods_demo.json

  "job": 
    "setting": 
      "speed": 
        "channel":2
      
    ,
    "content": [
      
        "reader": 
          "name": "mysqlreader",
          "parameter": 
            "username": "root",
            "password": "123456",
            "connection": [
              
                "querySql": [
                                "select id
									,shop_name
									,platform_name
									,admin_name
									,admin_phone
									,version
									,create_time
									,update_time
									from tb_demo;"
                ],
                "jdbcUrl": [
                  "jdbc:mysql://10.0.0.1:3306/db_demo"
                ]
              
            ]
          
        ,
        "writer": 
          "name": "hdfswriter",
          "parameter": 
            "hadoopConfig":
                            "dfs.client.failover.proxy.provider.bcluster": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
                            "dfs.ha.namenodes.bcluster": "nn1,nn2",
                            "dfs.namenode.rpc-address.bcluster.nn1": "bigdata02:8020",
                            "dfs.namenode.rpc-address.bcluster.nn2": "bigdata03:8020",
                            "dfs.nameservices": "bcluster"
                           ,
            "defaultFS": "hdfs://bcluster",
            "fileType": "orc",
            "path": "/warehouse/tablespace/managed/hive/demo.db/ods_tb_demo_df/bizdate=$bizdate",
            "fileName": "ods_tb_demo_df",
            "column": [
				"name":"id","type":"string",
				"name":"shop_name","type":"string",
				"name":"platform_name","type":"string",
				"name":"admin_name","type":"string",
				"name":"admin_phone","type":"string",
				"name":"version","type":"string",
				"name":"create_time","type":"string",
				"name":"update_time","type":"string"
            ],
            "writeMode": "append",
            "fieldDelimiter": "\\t",
            "compress": "SNAPPY"
          
         
        
    ]
  


[hdfs@demo01 ~]$

跑datax数据同步测试

python /usr/local/datax/bin/datax.py -p"-Dbizdate='20221206'" mysql2hive_ods_demo.json

[hdfs@demo01 ~]$ python /usr/local/datax/bin/datax.py -p"-Dbizdate='20221206'" mysql2hive_ods_demo.json

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2022-12-07 04:52:44.523 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2022-12-07 04:52:44.529 [main] INFO  Engine - the machine info  =>


2022-12-07 04:52:44.546 [main] INFO  Engine -

...


2022-12-07 04:52:44.559 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2022-12-07 04:52:44.560 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2022-12-07 04:52:44.560 [main] INFO  JobContainer - DataX jobContainer starts job.
2022-12-07 04:52:44.562 [main] INFO  JobContainer - Set jobId = 0
2022-12-07 04:52:44.805 [job-0] INFO  OriginalConfPretreatmentUtil - Available 
...
2022-12-07 04:52:55.852 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do post work.
2022-12-07 04:52:55.852 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2022-12-07 04:52:55.852 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /usr/hdp/datax/hook
2022-12-07 04:52:55.953 [job-0] INFO  JobContainer -
         [total cpu info] =>
                averageCpu                     | maxDeltaCpu                    | minDeltaCpu
                -1.00%                         | -1.00%                         | -1.00%


         [total gc info] =>
                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime
                 PS MarkSweep         | 1                  | 1                  | 1                  | 0.023s             | 0.023s             | 0.023s
                 PS Scavenge          | 1                  | 1                  | 1                  | 0.012s             | 0.012s             | 0.012s

2022-12-07 04:52:55.954 [job-0] INFO  JobContainer - PerfTrace not enable!
2022-12-07 04:52:55.954 [job-0] INFO  StandAloneJobContainerCommunicator - Total 78 records, 33832 bytes | Speed 3.30KB/s, 7 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2022-12-07 04:52:55.954 [job-0] INFO  JobContainer -
任务启动时刻                    : 2022-12-07 04:52:44
任务结束时刻                    : 2022-12-07 04:52:55
任务总计耗时                    :                 11s
任务平均流量                    :            3.30KB/s
记录写入速度                    :              7rec/s
读出记录总数                    :                  78
读写失败总数                    :                   0

[hdfs@demo01 ~]$

如此,datax的mysql2hdfs文件系统高可用配置文档 完毕

以上是关于datax的mysql2hdfs文件系统高可用配置教程的主要内容,如果未能解决你的问题,请参考以下文章

DataX的安装及使用

DataX的安装及使用

DataX的安装及使用

DataX Hdfs HA(高可用)配置支持

DataX系列10-DataX优化

hadoop 分布式文件系统的计算和高可用