测评：IvorySQL COPY VS MySQL LOAD

原创夏克 2022-12-08

542

测评：IvorySQL COPY VS MySQL LOAD

概述

IvorySQL/PostgreSQL的COPY和MySQL的LOAD都是快速加载文件的利器，本文将对两个功能进行性能对比。

测试场景准备

测试环境

这里使用的是信创环境的虚拟机和过程开源操作系统openEuler。（性能较差）

服务器

CPU	Kunpeng-920
Architecture	aarch64
On-line CPU(s) list	0-7
CPU主频	2600MHz

操作系统

NAME="openEuler"
VERSION="20.03 (LTS-SP2)"
ID="openEuler"
VERSION_ID="20.03"
PRETTY_NAME="openEuler 20.03 (LTS-SP2)"
ANSI_COLOR="0;31"
复制

系统性能

以下是unixbench跑分，供对比参考

------------------------------------------------------------------------
Benchmark Run: Wed Dec 07 2022 15:48:16 - 15:57:14
8 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       40999324.8 lps   (10.0 s, 2 samples)
Double-Precision Whetstone                     4314.8 MWIPS (9.3 s, 2 samples)
Execl Throughput                               3762.5 lps   (29.8 s, 1 samples)
File Copy 1024 bufsize 2000 maxblocks        685100.0 KBps  (30.0 s, 1 samples)
File Copy 256 bufsize 500 maxblocks          189282.0 KBps  (30.0 s, 1 samples)
File Copy 4096 bufsize 8000 maxblocks       1992147.0 KBps  (30.0 s, 1 samples)
Pipe Throughput                             1231178.1 lps   (10.0 s, 2 samples)
Pipe-based Context Switching                 105636.0 lps   (10.0 s, 2 samples)
Process Creation                               8963.6 lps   (30.0 s, 1 samples)
Shell Scripts (1 concurrent)                   7087.6 lpm   (60.0 s, 1 samples)
Shell Scripts (8 concurrent)                   3055.4 lpm   (60.0 s, 1 samples)
System Call Overhead                         907956.1 lps   (10.0 s, 2 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   40999324.8   3513.2
Double-Precision Whetstone                       55.0       4314.8    784.5
Execl Throughput                                 43.0       3762.5    875.0
File Copy 1024 bufsize 2000 maxblocks          3960.0     685100.0   1730.1
File Copy 256 bufsize 500 maxblocks            1655.0     189282.0   1143.7
File Copy 4096 bufsize 8000 maxblocks          5800.0    1992147.0   3434.7
Pipe Throughput                               12440.0    1231178.1    989.7
Pipe-based Context Switching                   4000.0     105636.0    264.1
Process Creation                                126.0       8963.6    711.4
Shell Scripts (1 concurrent)                     42.4       7087.6   1671.6
Shell Scripts (8 concurrent)                      6.0       3055.4   5092.4
System Call Overhead                          15000.0     907956.1    605.3
                                                                   ========
System Benchmarks Index Score                                        1258.0

------------------------------------------------------------------------
Benchmark Run: Wed Dec 07 2022 15:57:14 - 16:06:18
8 CPUs in system; running 16 parallel copies of tests

Dhrystone 2 using register variables      326630138.1 lps   (10.0 s, 2 samples)
Double-Precision Whetstone                    35757.3 MWIPS (8.7 s, 2 samples)
Execl Throughput                              23047.5 lps   (29.7 s, 1 samples)
File Copy 1024 bufsize 2000 maxblocks        525702.0 KBps  (30.0 s, 1 samples)
File Copy 256 bufsize 500 maxblocks          139688.0 KBps  (30.0 s, 1 samples)
File Copy 4096 bufsize 8000 maxblocks       1834232.0 KBps  (30.0 s, 1 samples)
Pipe Throughput                             9754452.8 lps   (10.0 s, 2 samples)
Pipe-based Context Switching                1181849.8 lps   (10.0 s, 2 samples)
Process Creation                              42295.7 lps   (30.0 s, 1 samples)
Shell Scripts (1 concurrent)                  32399.6 lpm   (60.0 s, 1 samples)
Shell Scripts (8 concurrent)                   4529.9 lpm   (60.1 s, 1 samples)
System Call Overhead                        3386402.8 lps   (10.0 s, 2 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0  326630138.1  27988.9
Double-Precision Whetstone                       55.0      35757.3   6501.3
Execl Throughput                                 43.0      23047.5   5359.9
File Copy 1024 bufsize 2000 maxblocks          3960.0     525702.0   1327.5
File Copy 256 bufsize 500 maxblocks            1655.0     139688.0    844.0
File Copy 4096 bufsize 8000 maxblocks          5800.0    1834232.0   3162.5
Pipe Throughput                               12440.0    9754452.8   7841.2
Pipe-based Context Switching                   4000.0    1181849.8   2954.6
Process Creation                                126.0      42295.7   3356.8
Shell Scripts (1 concurrent)                     42.4      32399.6   7641.4
Shell Scripts (8 concurrent)                      6.0       4529.9   7549.8
System Call Overhead                          15000.0    3386402.8   2257.6
                                                                   ========
System Benchmarks Index Score                                        4252.8

复制

数据准备

使用benchmarksql-5.0中的历史表作为被测表，建表脚本如下：

create table bmsql_history (
  hist_id  integer,
  h_c_id   integer,
  h_c_d_id integer,
  h_c_w_id integer,
  h_d_id   integer,
  h_w_id   integer,
  h_date   timestamp,
  h_amount decimal(6,2),
  h_data   varchar(24)
);
复制

测试数据生成脚本

根据字段类型写了一个生成测试数据的脚本，代码如下：

# -*- coding=utf-8 -*-

import csv
import random
import time
import string

# 创建列表，保存header内容
header_list = ["hist_id", "h_c_id", "h_c_d_id", "h_c_w_id", "h_d_id", "h_w_id", "h_date", "h_amount", "h_data"]

g_count = 0

def random_list(n):
    data_list = []
    global g_count
    for i in range(n):
        g_count = g_count + 1
        l = [g_count, 
        random.randint(0,1000),
        random.randint(0,1000), 
        random.randint(0,1000),
        random.randint(0,1000),
        random.randint(0,1000), 
        time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()), 
        #random.randint(0,1000), 
        round(random.uniform(0, 9999.0), 2), 
        ''.join(random.sample(string.ascii_letters + string.digits, 20))
        ]
        data_list.append(l)
    
    return data_list

# 以写方式打开文件。注意添加 newline=""，否则会在两行数据之间都插入一行空白。
with open("bmsql_history.csv", mode="w", encoding="utf-8", newline="") as f:
    
    # 基于打开的文件，创建 csv.writer 实例
    writer = csv.writer(f)

    # 写入 header。
    # writerow() 一次只能写入一行。
    writer.writerow(header_list)

    # 写入数据。
    # writerows() 一次写入多行。
    for i in range(10000):
        writer.writerows(random_list(1000))

复制

测试数据

执行脚本后会生成10000000行测试数据，具体如下图：

图片.png

MySQL LOAD

简介

MySQL load data 语句能快速将一个文本文件的内容导入到对应的数据库表中（一般文本的一行对应表的一条记录）；
数据库应用程序开发中，涉及大批量数据需要插入时，使用 load data 语句的效率比一般的 insert 语句的高很多；
可以看成select … into outfile语句的反操作，select … into outfile将数据库表中的数据导出保存到一个文件中。参考MySQL 5.7 官方手册；

语法

LOAD DATA
    [LOW_PRIORITY | CONCURRENT] [LOCAL]
    INFILE 'file_name'
    [REPLACE | IGNORE]
    INTO TABLE tbl_name
    [PARTITION (partition_name [, partition_name] ...)]
    [CHARACTER SET charset_name]
    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
    ]
    [LINES
        [STARTING BY 'string']
        [TERMINATED BY 'string']
    ]
    [IGNORE number {LINES | ROWS}]
    [(col_name_or_user_var
        [, col_name_or_user_var] ...)]
    [SET col_name={expr | DEFAULT},
        [, col_name={expr | DEFAULT}] ...]
复制

测试

-- LOAD DATA LOCAL INFILE '文件路径'  INTO TABLE tableName FIELDS TERMINATED BY ',';

LOAD DATA LOCAL INFILE '/root/bmsql_history.csv' INTO TABLE bmsql_history FIELDS TERMINATED BY ',';
复制

图片.png

耗时：69.58s

图片.png

IvorySQL COPY

简介

COPY moves data between PostgreSQL tables and standard file-system files. COPY TO copies the contents of a table to a file, while COPY FROM copies data from a file to a table (appending the data to whatever is in the table already). COPY TO can also copy the results of a SELECT query.

If a column list is specified, COPY TO copies only the data in the specified columns to the file. For COPY FROM, each field in the file is inserted, in order, into the specified column. Table columns not specified in the COPY FROM column list will receive their default values.

COPY with a file name instructs the PostgreSQL server to directly read from or write to a file. The file must be accessible by the PostgreSQL user (the user ID the server runs as) and the name must be specified from the viewpoint of the server. When PROGRAM is specified, the server executes the given command and reads from the standard output of the program, or writes to the standard input of the program. The command must be specified from the viewpoint of the server, and be executable by the PostgreSQL user. When STDIN or STDOUT is specified, data is transmitted via the connection between the client and the server.

语法

  Copy ::= COPY table_name [ ( column_name [, ...] ) ]
        FROM { 'filename' | STDIN }
            [ [ USING ] DELIMITERS 'delimiters' ]
            [ WITHOUT ESCAPING ]
            [ LOG ERRORS ]
            [ REJECT LIMIT 'limit' ]
            [ WITH ( option [, ...] ) ]
            | copy_option
            | TRANSFORM  ( { column_name [ data_type ] [ AS transform_expr ] } [, ...] )
            | FIXED FORMATTER ( { column_name( offset, length ) } [, ...] ) [ ( option [, ...] ) | copy_option [  ...] ] ;
复制

测试

IvorySQL版本（通过编译安装）

图片.png

postgres=# copy bmsql_history from '/home/ivory/bmsql_history.csv' delimiter ',';
COPY 10000000
Time: 21108.218 ms (00:21.108)
复制

图片.png

耗时：约20s

PostgreSQL COPY

测试

PostgreSQL版本（编译安装）

图片.png

postgres=# copy bmsql_history from '/home/postgres/bmsql_history.csv' delimiter ','
postgres-# ;
COPY 10000000
Time: 19829.354 ms (00:19.829)
复制

耗时：约20s

对比

耗时，PostgreSQL与IvorySQL基本相当，从版本上看内核都是16devel，且这部分功能两者应该完全相同。但MySQL的耗时在70s左右，是IvorySQL的3倍以上。

MySQL	PostgreSQL	IvorySQL
69.58s	19.829	19.948

CPU使用率，

图片.png

I/O延时

图片.png

总结

在同一平台，同等数量情况下，IvorySQL到时候文件的性能远优于MySQL。

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

测评：IvorySQL COPY VS MySQL LOAD
概述
测试场景准备
- 测试环境
  - 服务器
  - 操作系统
  - 系统性能
- 数据准备
  - 测试数据生成脚本
  - 测试数据
MySQL LOAD
- 简介
- 语法
- 测试
IvorySQL COPY
- 简介
- 语法
- 测试
PostgreSQL COPY
- 测试
对比
总结

测评：IvorySQL COPY VS MySQL LOAD

测评：IvorySQL COPY VS MySQL LOAD

概述

测试场景准备

测试环境

服务器

操作系统

系统性能

数据准备

测试数据生成脚本

测试数据

MySQL LOAD

简介

语法

测试

IvorySQL COPY

简介

语法

测试

PostgreSQL COPY

测试

对比

总结

评论