暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

避免乱码快速将oracle数据导入到heavydb中

原创 jieguo 2023-09-18
377

通过oracle导出csv再在heavydb用copy方式导入csv,注意生成的csv必须是utf8格式,否则中文乱码。
上一篇heavydb的安装文档还是在一年以前的了,参见https://www.modb.pro/db/400781

1.利用sqluldr2从oracle导出csv文件

可参考https://www.modb.pro/db/1696201950291382272

[oracle]$ /home/oracle/sqluldr2 test/test@192.168.207.58:1521/lndb query="select * from addressinfo" head=yes file=addressinfo.csv text=CSV Charset=utf8
           0 rows exported at 2023-09-18 12:47:55, size 0 MB.
     1000000 rows exported at 2023-09-18 12:48:05, size 388 MB.
     2000000 rows exported at 2023-09-18 12:48:16, size 815 MB.
     3000000 rows exported at 2023-09-18 12:48:30, size 1239 MB.
     4000000 rows exported at 2023-09-18 12:48:43, size 1662 MB.
     5000000 rows exported at 2023-09-18 12:48:55, size 2098 MB.
     6000000 rows exported at 2023-09-18 12:49:10, size 2554 MB.
     7000000 rows exported at 2023-09-18 12:49:27, size 2985 MB.
     8000000 rows exported at 2023-09-18 12:49:41, size 3369 MB.
     9000000 rows exported at 2023-09-18 12:49:56, size 3773 MB.
    10000000 rows exported at 2023-09-18 12:50:10, size 4176 MB.
    11000000 rows exported at 2023-09-18 12:50:26, size 4584 MB.
    12000000 rows exported at 2023-09-18 12:50:39, size 5003 MB.
    13000000 rows exported at 2023-09-18 12:50:50, size 5435 MB.
    14000000 rows exported at 2023-09-18 12:51:04, size 5843 MB.
    15000000 rows exported at 2023-09-18 12:51:20, size 6282 MB.
    16000000 rows exported at 2023-09-18 12:51:33, size 6682 MB.
    17000000 rows exported at 2023-09-18 12:51:45, size 7113 MB.
    18000000 rows exported at 2023-09-18 12:51:56, size 7553 MB.
    19000000 rows exported at 2023-09-18 12:52:08, size 8005 MB.
    20000000 rows exported at 2023-09-18 12:52:22, size 8380 MB.
    21000000 rows exported at 2023-09-18 12:52:34, size 8784 MB.
    22000000 rows exported at 2023-09-18 12:52:48, size 9219 MB.
    23000000 rows exported at 2023-09-18 12:52:59, size 9671 MB.
    24000000 rows exported at 2023-09-18 12:53:10, size 10107 MB.
    25000000 rows exported at 2023-09-18 12:53:25, size 10514 MB.
    26000000 rows exported at 2023-09-18 12:53:36, size 10926 MB.
    27000000 rows exported at 2023-09-18 12:53:52, size 11349 MB.
    28000000 rows exported at 2023-09-18 12:54:07, size 11761 MB.
    29000000 rows exported at 2023-09-18 12:54:20, size 12181 MB.
    30000000 rows exported at 2023-09-18 12:54:34, size 12612 MB.
    30748154 rows exported at 2023-09-18 12:54:42, size 12956 MB.
         output file addressinfo.csv closed at 30748154 rows, size 12956 MB.

另外也可以使用存储过程方法,不过速度太慢,无法用工具时可应急:
2b8c540200d6b3a439ee35db1dd879f.png

SQL> CREATE OR REPLACE PROCEDURE SQL_TO_CSV
(
P_QUERY IN VARCHAR2, -- PLSQL文
P_DIR IN VARCHAR2, -- 导出的文件放置目录
P_FILENAME IN VARCHAR2 -- CSV名
)
IS
L_OUTPUT UTL_FILE.FILE_TYPE;
L_THECURSOR INTEGER DEFAULT DBMS_SQL.OPEN_CURSOR;
L_COLUMNVALUE VARCHAR2(4000);
L_STATUS INTEGER;
L_COLCNT NUMBER := 0;
L_SEPARATOR VARCHAR2(1);
L_DESCTBL DBMS_SQL.DESC_TAB;
P_MAX_LINESIZE NUMBER := 32000;
BEGIN
--OPEN FILE
L_OUTPUT := UTL_FILE.FOPEN(P_DIR, P_FILENAME, 'W', P_MAX_LINESIZE);
--DEFINE DATE FORMAT
EXECUTE IMMEDIATE 'ALTER SESSION SET NLS_DATE_FORMAT=''YYYY-MM-DD HH24:MI:SS''';
--OPEN CURSOR
DBMS_SQL.PARSE(L_THECURSOR, P_QUERY, DBMS_SQL.NATIVE);
DBMS_SQL.DESCRIBE_COLUMNS(L_THECURSOR, L_COLCNT, L_DESCTBL);
--DUMP TABLE COLUMN NAME
FOR I IN 1 .. L_COLCNT LOOP
UTL_FILE.PUT(L_OUTPUT,L_SEPARATOR || '"' || L_DESCTBL(I).COL_NAME || '"'); --输出表字段
DBMS_SQL.DEFINE_COLUMN(L_THECURSOR, I, L_COLUMNVALUE, 4000);
L_SEPARATOR := ',';
END LOOP;
UTL_FILE.NEW_LINE(L_OUTPUT); --输出表字段
--EXECUTE THE QUERY STATEMENT
L_STATUS := DBMS_SQL.EXECUTE(L_THECURSOR);
--DUMP TABLE COLUMN VALUE
WHILE (DBMS_SQL.FETCH_ROWS(L_THECURSOR) > 0) LOOP
L_SEPARATOR := '';
FOR I IN 1 .. L_COLCNT LOOP
DBMS_SQL.COLUMN_VALUE(L_THECURSOR, I, L_COLUMNVALUE);
UTL_FILE.PUT(L_OUTPUT,
L_SEPARATOR || '"' ||
TRIM(BOTH ' ' FROM REPLACE(L_COLUMNVALUE, '"', '""')) || '"');
L_SEPARATOR := ',';
END LOOP;
UTL_FILE.NEW_LINE(L_OUTPUT);
END LOOP;
--CLOSE CURSOR
DBMS_SQL.CLOSE_CURSOR(L_THECURSOR);
--CLOSE FILE
UTL_FILE.FCLOSE(L_OUTPUT);
EXCEPTION
WHEN OTHERS THEN
RAISE;
END;
/

Procedure created.


SQL> create directory ORADMP as '/oracle/dmp';
SQL> grant read,write on directory oradmp to public;

SQL> set lines 200 pages 200
SQL> col owner for a30 
SQL> col directory_name for a30 
SQL> col directory_path for a85 
SQL> select * from dba_directories ;

OWNER                          DIRECTORY_NAME                 DIRECTORY_PATH
------------------------------ ------------------------------ -------------------------------------------------------------------------------------
SYS                            ORACLE_OCM_CONFIG_DIR2         /oracle/app/product/11.2.0/db_1/ccr/state
SYS                            DATA_PUMP_DIR                  /oracle/app/admin/lndb/dpdump/
SYS                            ORACLE_OCM_CONFIG_DIR          /oracle/app/product/11.2.0/db_1/ccr/hosts/lnkf/state
SYS                            XMLDIR                         /oracle/app/product/11.2.0/db_1/rdbms/xml
SYS                            ORADMP                         /oracle/dmp

6 rows selected.

SQL> set timing on
SQL> EXEC sql_to_csv('select * from addressinfo','ORADMP','addressinfo.csv');

2.提取oracle中表结构

通过pl/sql工具或其它命令获取即可

-- Create table
create table ADDRESSINFO
(
  id                VARCHAR2(16) not null,
  ods_id            VARCHAR2(64),
  code              VARCHAR2(2000),
  province          VARCHAR2(100),
  city              VARCHAR2(100),
  regionid          VARCHAR2(20),
  region_name       VARCHAR2(256),
  street_id         VARCHAR2(50),
  street_name       VARCHAR2(512),
  road_id           VARCHAR2(50),
  road_name         VARCHAR2(512),
  number_plate_id   VARCHAR2(50),
  number_plate_name VARCHAR2(512),
  cpn_id            VARCHAR2(50),
  cpn_code          VARCHAR2(50),
  community_name    VARCHAR2(255),
  premises_network  VARCHAR2(32),
  building_code     VARCHAR2(50),
  building          VARCHAR2(255),
  unit_id           VARCHAR2(50),
  unit              VARCHAR2(128),
  layer_id          VARCHAR2(50),
  layer             VARCHAR2(128),
  room_id           VARCHAR2(50),
  room              VARCHAR2(128),
  rms_code          VARCHAR2(255),
  substation_id     VARCHAR2(16),
  simple_mark       VARCHAR2(4000),
  inner_symbol_id   VARCHAR2(16),
  if_inner          VARCHAR2(2),
  switch_no         VARCHAR2(64),
  type              NUMBER(2) default 0,
  queryindex        VARCHAR2(255),
  remark            VARCHAR2(255),
  alias_name        VARCHAR2(255),
  if_relate         VARCHAR2(1) default 0,
  addresstype       VARCHAR2(10),
  source            VARCHAR2(32),
  create_time       DATE,
  fldtag            VARCHAR2(64) default 49,
  original_company  VARCHAR2(1) default 0,
  useremark         VARCHAR2(16),
  village_code      VARCHAR2(50),
  update_time       DATE,
  create_op         VARCHAR2(32)
);

3.在heavydb中创建表结构

注意字段类型修改参考:DATE改成timestamp,varchar2改成TEXT,NUMBER(2)可改成DECIMAL(2,0)
参考:https://docs.heavy.ai/sql/data-definition-ddl/datatypes-and-fixed-encoding

heavysql> create database lnlt;
heavysql> \c lnlt admin XXX;
heavysql> create table ADDRESSINFO
(
  id TEXT ENCODING NONE,--内容重复性低的类型
  ods_id            TEXT,
  code TEXT ENCODING NONE,--内容重复性低的类型
  province          TEXT,
  city              TEXT,
  regionid          TEXT,
  region_name       TEXT,
  street_id         TEXT,
  street_name       TEXT,
  road_id           TEXT,
  road_name         TEXT,
  number_plate_id   TEXT,
  number_plate_name TEXT,
  cpn_id            TEXT,
  cpn_code          TEXT,
  community_name    TEXT,
  premises_network  TEXT,
  building_code     TEXT,
  building          TEXT,
  unit_id           TEXT,
  unit              TEXT,
  layer_id          TEXT,
  layer             TEXT,
  room_id           TEXT,
  room              TEXT,
  rms_code          TEXT,
  substation_id     TEXT,
  simple_mark       TEXT,
  inner_symbol_id   TEXT,
  if_inner          TEXT,
  switch_no         TEXT,
  type              INTEGER,
  queryindex        TEXT,
  remark            TEXT,
  alias_name        TEXT,
  if_relate         TEXT,
  addresstype       TEXT,
  source            TEXT,
  create_time       timestamp,
  fldtag            TEXT,
  original_company  TEXT,
  useremark         TEXT,
  village_code      TEXT,
  uptimestamp_time  timestamp,
  create_op         TEXT
);

4.scp到heavydb服务器

heavyai@node13:/var/lib/heavyai/storage/import/sample_datasets$ scp root@192.168.205.58:/oracle/addressinfo.csv ./
root@192.168.205.58's password: 
addressinfo.csv                                                                                                                                                         100%   13GB   8.6MB/s   24:59    
heavyai@node13:/var/lib/heavyai/storage/import/sample_datasets$ ll
total 127487244
drwxr-xr-x 4 heavyai heavyai         4096 9 18 12:49 ./
drwxr-xr-x 3 root    root            4096 9  7 15:54 ../
-rw-r--r-- 1 heavyai heavyai  13585817776 9 18 13:14 addressinfo.csv

5.导入到heavydb

heavysql> COPY addressinfo FROM '/var/lib/heavyai/storage/import/sample_datasets/addressinfo.csv';
Result
Loaded: 30748154 recs, Rejected: 0 recs in 57.354000 secs
1 rows returned.
Execution time: 57716 ms, Total time: 57718 ms
heavysql> select count(*) from addressinfo;
EXPR$0
30748154
1 rows returned.
Execution time: 14 ms, Total time: 15 ms

非交互式操作:

cat test.sql | heavysql -p admin --db lnlt

可能遇到显示乱码问题?:

如果dbeaver查询乱码,需转换csv格式再copy导入。

centos/rehdat命令检查:
file addressinfo.csv
image.png

如下是ubuntu下检查和转换方法:
转换命令参考:https://zhuanlan.zhihu.com/p/27142605

heavyai@node13:/var/lib/heavyai/storage/import/sample_datasets$ enca -L zh_CN sid_latn1.csv 
Simplified Chinese National Standard; GB2312
heavyai@node13:/var/lib/heavyai/storage/import/sample_datasets$ enca -L zh_CN -x UTF-8 < sid_latn1.csv > sid_latn2.csv
heavyai@node13:/var/lib/heavyai/storage/import/sample_datasets$ enca -L zh_CN sid_latn2.csv 
Universal transformation format 8 bits; UTF-8

image.png

常用命令:

修改密码:ALTER USER admin (password = 'admin');
非交互式执行sql:
[heavyai@test heavyai]$ ./bin/heavysql heavyai -u admin -p admin -s 192.168.207.170 < /var/lib/heavyai/storage/import/sample_datasets/nyc_trees_2015_683k/nyc_trees_2015_683k.sql
User admin connected to database heavyai
User admin disconnected from database heavyai
非交互式导入数据:
[heavyai@test heavyai]$  echo "copy nyc_trees_2015_683k from '/var/lib/heavyai/storage/import/sample_datasets/nyc_trees_2015_683k/nyc_trees_2015_683k.csv' with (quoted='true');" | ./bin/heavysql heavyai -u admin -p admin -s 192.168.207.170
User admin connected to database heavyai
Result
Loaded: 683788 recs, Rejected: 0 recs in 1.570000 secs
User admin disconnected from database heavyai
配置文件参考:
heavyai@node13:/opt/heavyai/bin$ cat /var/lib/heavyai/heavy.conf 
port = 6274
http-port = 6278
calcite-port = 6279
data = "/var/lib/heavyai/storage"
null-div-by-zero = true
#cpu-only=true
#rendering=false
allowed-export-paths=["/var/lib/heavyai6.4","/var/lib/heavyai"]
allowed-import-paths=["/var/lib/heavyai6.4","/var/lib/heavyai"]
[web]
port = 6273
frontend = "/opt/heavyai/frontend"

参考问答:https://support.heavy.ai/hc/en-us/community/posts/17453735510423-Why-is-the-Chinese-display-of-query-results-messy-with-dbeaver-

最后修改时间:2023-11-14 14:41:07
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论