微信公众号:知了小巷


DataX 是一个异构数据源离线同步工具,致力于实现包括关系型数据库(MySQL、Oracle等)、HDFS、Hive、ODPS、HBase、FTP等各种异构数据源之间稳定高效的数据同步功能。
1. DataX简介
1. git https://github.com/alibaba/DataX.git
2. 可以开发自己的reader或writer插件,也可以对已经存在的进行改进
3. DataX3.0文档 https://github.com/alibaba/DataX/blob/master/introduction.md
1$ sudo wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
2$ sudo tar zxvf datax.tar.gz
3$ sudo chmod -R 755 datax
4$ ls datax
5bin conf job lib plugin script tmp
6$ ls datax/plugin/reader/
7drdsreader hbase11xreader mysqlreader ossreader postgresqlreader streamreader
8ftpreader hdfsreader odpsreader otsreader rdbmsreader txtfilereader
9hbase094xreader mongodbreader oraclereader otsstreamreader sqlserverreader
10$ ls datax/plugin/writer/
11adswriter hbase094xwriter hbase11xwriter mysqlwriter oraclewriter postgresqlwriter streamwriter
12drdswriter hbase11x_updatewriter hdfswriter ocswriter osswriter rdbmswriter txtfilewriter
13ftpwriter hbase11xsqlwriter mongodbwriter odpswriter otswriter sqlserverwriter
2. 生成需要的JSON模板
JSON配置说明
1. MySQL Reader https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md
2. MySQL Writer https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md
1$ python bin/datax.py -r mysqlreader -w mysqlwriter
1{
2 "job": {
3 "content": [
4 {
5 "reader": {
6 "name": "mysqlreader",
7 "parameter": {
8 "column": [],
9 "connection": [
10 {
11 "jdbcUrl": [],
12 "table": []
13 }
14 ],
15 "password": "",
16 "username": "",
17 "where": ""
18 }
19 },
20 "writer": {
21 "name": "mysqlwriter",
22 "parameter": {
23 "column": [],
24 "connection": [
25 {
26 "jdbcUrl": "",
27 "table": []
28 }
29 ],
30 "password": "",
31 "preSql": [],
32 "session": [],
33 "username": "",
34 "writeMode": ""
35 }
36 }
37 }
38 ],
39 "setting": {
40 "speed": {
41 "channel": ""
42 }
43 }
44 }
45}
3. JSON配置
1. 使用$daystr通过命令行传参,控制某一天的数据
2. preSql可以在插入数据前先执行删除操作
1{
2 "job": {
3 "content": [
4 {
5 "reader": {
6 "name": "mysqlreader",
7 "parameter": {
8 "column": [],
9 "connection": [
10 {
11 "jdbcUrl": ["jdbc:mysql://localhost:3306/db_crawler"],
12 "querySql": ["select * from test where daystr='$daystr'"]
13 }
14 ],
15 "password": "",
16 "username": "root",
17 "where": ""
18 }
19 },
20 "writer": {
21 "name": "mysqlwriter",
22 "parameter": {
23 "column": ["daystr", "city_name", "area_name", "site", "number_code", "residential_name", "lng", "lat"],
24 "connection": [
25 {
26 "jdbcUrl": "jdbc:mysql://rm-xxx.rds.aliyuncs.com:3306/db_warehouse",
27 "table": ["test"]
28 }
29 ],
30 "password": "",
31 "preSql": ["delete from test where daystr='$daystr'"],
32 "session": ["set session sql_mode='ANSI'"],
33 "username": "root",
34 "writeMode": "insert"
35 }
36 }
37 }
38 ],
39 "setting": {
40 "speed": {
41 "channel": "5"
42 }
43 }
44 }
45}
查看输出的日志
12018-02-08 20:46:28.730 [job-0] INFO JobContainer -
2 [total cpu info] =>
3 averageCpu | maxDeltaCpu | minDeltaCpu
4 10.92% | 10.92% | 10.92%
5
6
7 [total gc info] =>
8 NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
9 PS MarkSweep | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
10 PS Scavenge | 1 | 1 | 1 | 0.018s | 0.018s | 0.018s
11
122018-02-08 20:46:28.730 [job-0] INFO JobContainer - PerfTrace not enable!
132018-02-08 20:46:28.730 [job-0] INFO StandAloneJobContainerCommunicator - Total 65535 records, 3616897 bytes | Speed 353.21KB/s, 6553 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 1.327s | All Task WaitReaderTime 0.475s | Percentage 100.00%
142018-02-08 20:46:28.731 [job-0] INFO JobContainer -
15任务启动时刻 : 2018-02-08 20:46:17
16任务结束时刻 : 2018-02-08 20:46:28
17任务总计耗时 : 11s
18任务平均流量 : 353.21KB/s
19记录写入速度 : 6553rec/s
20读出记录总数 : 65535
21读写失败总数 : 0

文章转载自大数据真有意思,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




