暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

DataX使用笔记

醉鱼Java 2021-10-18
398

DataX 使用说明

datax是通过指定一个配置文件,命令行执行,离线数据同步工具

官方说明如下

DataX 是阿里云 DataWorks数据集成 的开源版本,在阿里巴巴集团内被广泛使用的离线数据同步工具/平台。DataX 实现了包括 MySQL、Oracle、OceanBase、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、Hologres、DRDS 等各种异构数据源之间高效的数据同步功能。

下载DataX

根据README的说明一步步安装即可使用基础版的demo

https://github.com/alibaba/DataX/blob/master/userGuid.md

  • 下载源码

    $ git clone git@github.com:alibaba/DataX.git

  • 通过maven打包

    $ cd  {DataX_source_code_home}
    $ mvn -U clean package assembly:assembly -Dmaven.test.skip=true


    注意:打包完的文件在源码根目录/target/datax/datax

打包失败原因

  • JDK版本环境问题,使用JDK1.8

  • maven仓库包缺少

测试demo启动

  • 查看配置模板

    python datax.py -r streamreader -w streamwriter

  • 创建demo配置文件 stream2stream.json

    touch stream2stream.json

  • 文件内容如下

    {
    "job": {
    "content": [
    {
    "reader": {
    "name": "streamreader",
    "parameter": {
    "sliceRecordCount": 10,
    "column": [
    {
    "type": "long",
    "value": "10"
    },
    {
    "type": "string",
    "value": "hello,你好,世界-DataX"
    }
    ]
    }
    },
    "writer": {
    "name": "streamwriter",
    "parameter": {
    "encoding": "UTF-8",
    "print": true
    }
    }
    }
    ],
    "setting": {
    "speed": {
    "channel": 5
    }
    }
    }
    }


  • 启动流程,在源码目录的bin目录下面

    $ cd {YOUR_DATAX_DIR_BIN}
    $ python datax.py ./stream2stream.json

Oracle2Dm.json

{
"job": {
"setting": {
"speed": {
"channel": 5
}
},
"content": [
{
"reader": {
"name": "oraclereader",
"parameter": {
"username": "test",
"password": "test",
"connection": [
{
"querySql": [
"select col1,col2,col3,col4,col5 from 表名"
],
"jdbcUrl": [
"jdbc:oracle:thin:@127.0.0.1:1521:orcl"
]
}
]
}
},
"writer": {
"name": "rdbmswriter",
"parameter": {
"connection": [
{
"jdbcUrl": "jdbc:dm://127.0.0.1:5236/TEST",
"table": [
"test_table"
]
}
],
"username": "TEST",
"password": "1234567890",
"table": "test_table",
"column": [
"col1",
"col2",
"col3",
"col4",
"col5"
],
"preSql": [
"delete from test_table;"
]
}
}
}
]
}
}

Oracle2MySql.json

{
"job": {
"setting": {
"speed": {
"channel": 1
}
},
"content": [
{
"reader": {
"name": "oraclereader",
"parameter": {
"username": "jdjda",
"password": "jdjda",
"connection": [
{
"querySql": [
"select ROLL_ID,ORGANIZATION_NO,ARCHIVES_NO,DEPARTMENT_NO,START_TIME from T_ARCHIVES_AJ_MAIN"
],
"jdbcUrl": [
"jdbc:oracle:thin:@192.168.168.66:1521:orcl"
]
}
]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"writeMode": "insert",
"username": "root",
"password": "root",
"column": [
"col1",
"col2",
"col3",
"col4",
"col5"
],
"session": [
"set session sql_mode='ANSI'"
],
"preSql": [
"delete from orcl"
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=gbk",
"table": [
"orcl"
]
}
]
}
}
}
]
}
}




更多配置文件信息参考github

https://github.com/alibaba/DataX







Java进阶
扫码关注 不迷路
微信:c1041067258




文章转载自醉鱼Java,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论