暂无图片
暂无图片
暂无图片
暂无图片
1
暂无图片

TuGraph(OceanBase图数据库)小课堂010:TuGraph-DataX-Tool 如何使用?

欢迎访问 OceanBase 官网获取更多信息:https://www.oceanbase.com/


此文档主要介绍 TuGraph DataX 的安装编译和使用示例

简介

TuGraph 在阿里开源的 DataX 基础上添加了 TuGraph 的写插件以及 TuGraph jsonline 数据格式的支持,其他数据源可以通过 DataX 往 TuGraph 里面写数据。 DataX 介绍参考 https://github.com/alibaba/DataX 支持的功能包括:

  • 从 MySQL、SQL Server、Oracle、PostgreSQL、HDFS、Hive、HBase、OTS、ODPS、Kafka 等各种异构数据源导入 TuGraph
  • 将 TuGraph 导入相应的目标源 (待开发)

编译安装

git clone git@code.alipay.com:fma/DataX.git
mvn -U clean package assembly:assembly -Dmaven.test.skip=true
复制

编译出来的 DataX 文件在 target 目录下

文本数据通过 DataX 导入 TuGraph

我们以 TuGraph 手册中导入工具 lgraph_import 章节举的数据为例子,有三个 csv 数据文件,如下: actors.csv

nm015950,Stephen Chow
nm0628806,Man-Tat Ng
nm0156444,Cecilia Cheung
nm2514879,Yuqi Zhang
复制

movies.csv

tt0188766,King of Comedy,1999,7.3
tt0286112,Shaolin Soccer,2001,7.3
tt4701660,The Mermaid,2016,6.3
复制

roles.csv

nm015950,Tianchou Yin,tt0188766
nm015950,Steel Leg,tt0286112
nm0628806,,tt0188766
nm0628806,coach,tt0286112
nm0156444,PiaoPiao Liu,tt0188766
nm2514879,Ruolan Li,tt4701660
复制

然后建三个 DataX 的 job 配置文件: job_actors.json

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "txtfilereader",
          "parameter": {
            "path": ["actors.csv"],
            "encoding": "UTF-8",
            "column": [
              {
                "index": 0,
                "type": "string"
              },
              {
                "index": 1,
                "type": "string"
              }
            ],
            "fieldDelimiter": ","
          }
        },
        "writer": {
          "name": "tugraphwriter",
          "parameter": {
            "host": "127.0.0.1",
            "port": 7071,
            "username": "admin",
            "password": "73@TuGraph",
            "graphName": "default",
            "schema": [
              {
                "label": "actor",
                "type": "VERTEX",
                "properties": [
                  { "name": "aid", "type": "STRING" },
                  { "name": "name", "type": "STRING" }
                ],
                "primary": "aid"
              }
            ],
            "files": [
              {
                "label": "actor",
                "format": "JSON",
                "columns": ["aid", "name"]
              }
            ]
          }
        }
      }
    ]
  }
}
复制

job_movies.json

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "txtfilereader",
          "parameter": {
            "path": ["movies.csv"],
            "encoding": "UTF-8",
            "column": [
              {
                "index": 0,
                "type": "string"
              },
              {
                "index": 1,
                "type": "string"
              },
              {
                "index": 2,
                "type": "string"
              },
              {
                "index": 3,
                "type": "string"
              }
            ],
            "fieldDelimiter": ","
          }
        },
        "writer": {
          "name": "tugraphwriter",
          "parameter": {
            "host": "127.0.0.1",
            "port": 7071,
            "username": "admin",
            "password": "73@TuGraph",
            "graphName": "default",
            "schema": [
              {
                "label": "movie",
                "type": "VERTEX",
                "properties": [
                  { "name": "mid", "type": "STRING" },
                  { "name": "name", "type": "STRING" },
                  { "name": "year", "type": "STRING" },
                  { "name": "rate", "type": "FLOAT", "optional": true }
                ],
                "primary": "mid"
              }
            ],
            "files": [
              {
                "label": "movie",
                "format": "JSON",
                "columns": ["mid", "name", "year", "rate"]
              }
            ]
          }
        }
      }
    ]
  }
}
复制

job_roles.json

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "txtfilereader",
          "parameter": {
            "path": ["roles.csv"],
            "encoding": "UTF-8",
            "column": [
              {
                "index": 0,
                "type": "string"
              },
              {
                "index": 1,
                "type": "string"
              },
              {
                "index": 2,
                "type": "string"
              }
            ],
            "fieldDelimiter": ","
          }
        },
        "writer": {
          "name": "tugraphwriter",
          "parameter": {
            "host": "127.0.0.1",
            "port": 7071,
            "username": "admin",
            "password": "73@TuGraph",
            "graphName": "default",
            "schema": [
              {
                "label": "play_in",
                "type": "EDGE",
                "properties": [{ "name": "role", "type": "STRING" }]
              }
            ],
            "files": [
              {
                "label": "play_in",
                "format": "JSON",
                "SRC_ID": "actor",
                "DST_ID": "movie",
                "columns": ["SRC_ID", "role", "DST_ID"]
              }
            ]
          }
        }
      }
    ]
  }
}
复制

./lgraph_server -c lgraph_standalone.json -d 'run' 启动 TuGraph 后依次执行如下三个命令:

python3 datax/bin/datax.py  job_actors.json
复制
python3 datax/bin/datax.py  job_movies.json
复制
python3 datax/bin/datax.py  job_roles.json
复制

MySQL 数据通过 DataX 导入 TuGraph

我们在 test database 下建立如下电影 movies 表

CREATE TABLE `movies` (
  `mid`  varchar(200) NOT NULL,
  `name` varchar(100) NOT NULL,
  `year` int(11) NOT NULL,
  `rate` float(5,2) unsigned NOT NULL,
  PRIMARY KEY (`mid`)
);
复制

往表中插入几条数据

insert into
test.movies (mid, name, year, rate)
values
('tt0188766', 'King of Comedy', 1999, 7.3),
('tt0286112', 'Shaolin Soccer', 2001, 7.3),
('tt4701660', 'The Mermaid',   2016,  6.3);
复制

建立一个 DataX 的 job 配置文件

job_mysql_to_tugraph.json

配置字段方式

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "username": "root",
            "password": "root",
            "column": ["mid", "name", "year", "rate"],
            "splitPk": "mid",
            "connection": [
              {
                "table": ["movies"],
                "jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/test?useSSL=false"]
              }
            ]
          }
        },
        "writer": {
          "name": "tugraphwriter",
          "parameter": {
            "host": "127.0.0.1",
            "port": 7071,
            "username": "admin",
            "password": "73@TuGraph",
            "graphName": "default",
            "schema": [
              {
                "label": "movie",
                "type": "VERTEX",
                "properties": [
                  { "name": "mid", "type": "STRING" },
                  { "name": "name", "type": "STRING" },
                  { "name": "year", "type": "STRING" },
                  { "name": "rate", "type": "FLOAT", "optional": true }
                ],
                "primary": "mid"
              }
            ],
            "files": [
              {
                "label": "movie",
                "format": "JSON",
                "columns": ["mid", "name", "year", "rate"]
              }
            ]
          }
        }
      }
    ]
  }
}
复制

写简单 sql 方式

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "username": "root",
            "password": "root",
            "connection": [
              {
                "querySql": [
                  "select mid, name, year, rate from test.movies where year > 2000;"
                ],
                "jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/test?useSSL=false"]
              }
            ]
          }
        },
        "writer": {
          "name": "tugraphwriter",
          "parameter": {
            "host": "127.0.0.1",
            "port": 7071,
            "username": "admin",
            "password": "73@TuGraph",
            "graphName": "default",
            "schema": [
              {
                "label": "movie",
                "type": "VERTEX",
                "properties": [
                  { "name": "mid", "type": "STRING" },
                  { "name": "name", "type": "STRING" },
                  { "name": "year", "type": "STRING" },
                  { "name": "rate", "type": "FLOAT", "optional": true }
                ],
                "primary": "mid"
              }
            ],
            "files": [
              {
                "label": "movie",
                "format": "JSON",
                "columns": ["mid", "name", "year", "rate"]
              }
            ]
          }
        }
      }
    ]
  }
}
复制

./lgraph_server -c lgraph_standalone.json -d 'run' 启动 TuGraph 后执行如下命令:

python3 datax/bin/datax.py  job_mysql_to_tugraph.json
复制


欢迎访问 OceanBase 官网获取更多信息:https://www.oceanbase.com/

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

迷舍园
暂无图片
10月前
评论
暂无图片 0
【高薪诚聘】数据库精英,加入我们,共创数据未来! 我们寻找的你: 【如果你是数据库架构专家】 对数据库架构有深入的了解和实践经验,能够为公司构建稳定、高效、可扩展的数据库架构; 对各种数据库系统有深入的理解,包括但不限于MySQL、Oracle、SQL Server等,并能够根据实际业务需求进行选择和优化; 对新技术和新架构保持敏感,有能力推动公司的数据库技术不断升级和迭代。
10月前
暂无图片 点赞
评论