暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

使用DataX在MySQL之间同步数据

大数据真有意思 2018-12-17
342

微信公众号:知了小巷

DataX 是一个异构数据源离线同步工具,致力于实现包括关系型数据库(MySQL、Oracle等)、HDFS、Hive、ODPS、HBase、FTP等各种异构数据源之间稳定高效的数据同步功能。

1. DataX简介

1. git https://github.com/alibaba/DataX.git
2. 可以开发自己的reader或writer插件,也可以对已经存在的进行改进
3. DataX3.0文档 https://github.com/alibaba/DataX/blob/master/introduction.md

 1$ sudo wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
2$ sudo tar zxvf datax.tar.gz
3$ sudo chmod -R 755 datax
4$ ls datax
5bin    conf    job lib plugin  script  tmp
6$ ls datax/plugin/reader/
7drdsreader        hbase11xreader      mysqlreader     ossreader       postgresqlreader    streamreader
8ftpreader        hdfsreader      odpsreader      otsreader       rdbmsreader     txtfilereader
9hbase094xreader        mongodbreader       oraclereader        otsstreamreader     sqlserverreader
10$ ls datax/plugin/writer/
11adswriter        hbase094xwriter     hbase11xwriter      mysqlwriter     oraclewriter        postgresqlwriter    streamwriter
12drdswriter        hbase11x_updatewriter   hdfswriter      ocswriter       osswriter       rdbmswriter     txtfilewriter
13ftpwriter        hbase11xsqlwriter   mongodbwriter       odpswriter      otswriter       sqlserverwriter

2. 生成需要的JSON模板

JSON配置说明
1. MySQL Reader https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md
2. MySQL Writer https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md

1$ python bin/datax.py -r mysqlreader -w mysqlwriter

 1{
2    "job": {
3        "content": [
4            {
5                "reader": {
6                    "name""mysqlreader",
7                    "parameter": {
8                        "column": [],
9                        "connection": [
10                            {
11                                "jdbcUrl": [],
12                                "table": []
13                            }
14                        ],
15                        "password""",
16                        "username""",
17                        "where"""
18                    }
19                },
20                "writer": {
21                    "name""mysqlwriter",
22                    "parameter": {
23                        "column": [],
24                        "connection": [
25                            {
26                                "jdbcUrl""",
27                                "table": []
28                            }
29                        ],
30                        "password""",
31                        "preSql": [],
32                        "session": [],
33                        "username""",
34                        "writeMode"""
35                    }
36                }
37            }
38        ],
39        "setting": {
40            "speed": {
41                "channel"""
42            }
43        }
44    }
45}

3. JSON配置

1. 使用$daystr通过命令行传参,控制某一天的数据
2. preSql可以在插入数据前先执行删除操作

 1{
2    "job": {
3        "content": [
4            {
5                "reader": {
6                    "name""mysqlreader",
7                    "parameter": {
8                        "column": [],
9                        "connection": [
10                            {
11                                "jdbcUrl": ["jdbc:mysql://localhost:3306/db_crawler"],
12                                "querySql": ["select * from test where daystr='$daystr'"]
13                            }
14                        ],
15                        "password""",
16                        "username""root",
17                        "where"""
18                    }
19                },
20                "writer": {
21                    "name""mysqlwriter",
22                    "parameter": {
23                        "column": ["daystr""city_name""area_name""site""number_code""residential_name""lng""lat"],
24                        "connection": [
25                            {
26                                "jdbcUrl""jdbc:mysql://rm-xxx.rds.aliyuncs.com:3306/db_warehouse",
27                                "table": ["test"]
28                            }
29                        ],
30                        "password""",
31                        "preSql": ["delete from test where daystr='$daystr'"],
32                        "session": ["set session sql_mode='ANSI'"],
33                        "username""root",
34                        "writeMode""insert"
35                    }
36                }
37            }
38        ],
39        "setting": {
40            "speed": {
41                "channel""5"
42            }
43        }
44    }
45}

查看输出的日志

 12018-02-08 20:46:28.730 [job-0] INFO  JobContainer -
2         [total cpu info] =>
3                averageCpu                     | maxDeltaCpu                    | minDeltaCpu
4                10.92%                         | 10.92%                         | 10.92%
5
6
7         [total gc info] =>
8                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime
9                 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s
10                 PS Scavenge          | 1                  | 1                  | 1                  | 0.018s             | 0.018s             | 0.018s
11
122018-02-08 20:46:28.730 [job-0] INFO  JobContainer - PerfTrace not enable!
132018-02-08 20:46:28.730 [job-0] INFO  StandAloneJobContainerCommunicator - Total 65535 records, 3616897 bytes | Speed 353.21KB/s, 6553 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 1.327s |  All Task WaitReaderTime 0.475s | Percentage 100.00%
142018-02-08 20:46:28.731 [job-0] INFO  JobContainer -
15任务启动时刻                    : 2018-02-08 20:46:17
16任务结束时刻                    : 2018-02-08 20:46:28
17任务总计耗时                    :                 11s
18任务平均流量                    :          353.21KB/s
19记录写入速度                    :           6553rec/s
20读出记录总数                    :               65535
21读写失败总数                    :                   0



文章转载自大数据真有意思,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论