暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

ClickHouse (MATERIALIZED) VIEW

ClickHouse周边 2021-04-15
2631

        数据库中的视图(View) 指的是通过一张或多张表查询出来的逻辑表 ,本身只是一段 SQL 的封装并不存储数据。

        对于ClickHouse来说,视图分为普通视图和物化视图两类。其中普通视图只是一种简单查询映射,类似于linux的软连接;而物化视图数据独立存储,类似于linux的硬链接。

        普通视图:不存储数据,仅存储指定的 SELECT 查询(基表的查询映射)

        物化视图:数据独立存储(持久化),有表结构,有引擎

准备环境:

        假设有一个表test来记录用户下载的信息,并且可以追踪用户每天下的信息:

    clickhouse-client -m
    create database test;
    use test;

    CREATE TABLE test (
    when DateTime,
    userid UInt32,
    bytes Float32
    ) ENGINE=MergeTree
    PARTITION BY toYYYYMM(when)
    ORDER BY (userid, when);


    INSERT INTO test
    SELECT
        now() - number * 60 as when,
    25,
    rand() % 100000000
    FROM system.numbers
    LIMIT 5000;

    SELECT count(*)
    FROM test;

    ┌─count()─┐
    │ 5000 │
    └─────────┘

    SELECT *
    FROM test
    LIMIT 10

    ┌────────────────when─┬─userid─┬────bytes─┐
    2021-04-11 06:32:272571684700
    2021-04-11 06:33:272591794480
    2021-04-11 06:34:27259945427
    2021-04-11 06:35:272519805956
    2021-04-11 06:36:272558563650
    2021-04-11 06:37:272589094024
    2021-04-11 06:38:272516388096
    2021-04-11 06:39:272544223570
    2021-04-11 06:40:272533782450
    2021-04-11 06:41:272573357980
    └─────────────────────┴────────┴──────────┘
    复制

    举例:应用场景-求和

      计算:每个用户每天下载的次数和流量:

      SELECT
      toStartOfDay(when) AS day,
      userid,
      count() as downloads,
      sum(bytes) AS bytes
      FROM test
      GROUP BY userid, day
      ORDER BY userid, day;

      ┌─────────────────day─┬─userid─┬─downloads─┬───────bytes─┐
      │ 2021-04-11 00:00:00 │ 25 │ 1048 │ 51955725222 │
      │ 2021-04-12 00:00:00 │ 25 │ 1440 │ 71811987127 │
      │ 2021-04-13 00:00:00 │ 25 │ 1440 │ 71116881712 │
      │ 2021-04-14 00:00:00 │ 25 │ 1072 │ 54787389274 │
      └─────────────────────┴────────┴───────────┴─────────────┘
      复制

      创建普通视图

              对于普通视图每次运行查询来交互式地计算这些每日总计,对于大型表而言,会出现计算浪费资源,速度慢等问题。

        CREATE VIEW test_view AS
        SELECT
        toStartOfDay(when) AS day,
        userid,
        count() AS downloads,
        sum(bytes) AS bytes
        FROM test
        GROUP BY
        userid,
        day
        ORDER BY
        userid ASC,
            day ASC;

        SELECT *
        FROM test_view;

        ┌─────────────────day─┬─userid─┬─downloads─┬───────bytes─┐
        │ 2021-04-11 00:00:00 │ 25 │ 1048 │ 51955725222 │
        │ 2021-04-12 00:00:00 │ 25 │ 1440 │ 71811987127 │
        │ 2021-04-13 00:00:00 │ 25 │ 1440 │ 71116881712 │
        │ 2021-04-14 00:00:00 │ 25 │ 1072 │ 54787389274 │
        └─────────────────────┴────────┴───────────┴─────────────┘
        复制

        创建物化视图

                物化视图每次运行查询来交互式地这些每日总计,将结果放在单独的表格中,该表格可以连续跟踪每天每个用户的下载总数。

          创建语法:
          CREATE [MATERIALIZED] VIEW [IF NOT EXISTS] [db.]table_name [TO[db.]name]
          [ENGINE = engine]
          [POPULATE]
          AS SELECT ...

          CREATE MATERIALIZED VIEW test_mv
          ENGINE = SummingMergeTree
          PARTITION BY toYYYYMM(day) ORDER BY (userid, day)
          POPULATE
          AS SELECT
          toStartOfDay(when) AS day,
          userid,
          count() as downloads,
          sum(bytes) AS bytes
          FROM test
          GROUP BY userid, day;

          SELECT *
          FROM test_mv
          ORDER BY
              day
              userid 
          LIMIT 5;

          ┌─────────────────day─┬─userid─┬─downloads─┬───────bytes─┐
          │ 2021-04-11 00:00:00 │ 25 │ 1048 │ 51955725222 │
          │ 2021-04-12 00:00:00 │ 25 │ 1440 │ 71811987127 │
          │ 2021-04-13 00:00:00 │ 25 │ 1440 │ 71116881712 │
          │ 2021-04-14 00:00:00 │ 25 │ 1072 │ 54787389274 │
          └─────────────────────┴────────┴───────────┴─────────────┘

          复制

          增加物化视图后的架构如下图所示

          物化视图说明:

                  1.使用了简化总和计数的ClickHouse引擎:SummingMergeTree。

                  2.视图定义包括关键字POPULATE,决定了物化视图的更新策略:

          1. 若有POPULATE 则在创建视图的过程会将源表已经存在的数据一并导入,类似于 create table ... as

          2. 若无POPULATE 则物化视图在创建之后没有数据

                  3.视图定义包括SELECT语句,在加载视图时如何转换数据。该查询在表中的新数据上运行,以计算每天的下载量和每个用户ID的总字节数。可以跳过排序,因为视图定义已经确保了排序顺序。

          模拟产生新的数据:

            INSERT INTO test SELECT 
                now() - (number * 60AS when
            22,
            rand() % 100000000
            FROM system.numbers
            LIMIT 5000


            SELECT
            toStartOfMonth(day) AS month,
            userid,
            sum(downloads),
            sum(bytes)
            FROM test_mv
            GROUP BY
            userid,
            month
            WITH TOTALS
            ORDER BY
            userid ASC,
            month ASC

            ┌──────month─┬─userid─┬─sum(downloads)─┬───sum(bytes)─┐
            2021-04-012210000502557007867
            2021-04-012510000499000207535
            └────────────┴────────┴────────────────┴──────────────┘


            SELECT *
            FROM test_mv
            ORDER BY
            userid ASC,
            day ASC

            ┌─────────────────day─┬─userid─┬─downloads─┬───────bytes─┐
            2021-04-11 00:00:0022102552647926147
            2021-04-12 00:00:0022144072949246117
            2021-04-13 00:00:0022144072346659092
            2021-04-14 00:00:0022109555468665883
            └─────────────────────┴────────┴───────────┴─────────────┘
            ┌─────────────────day─┬─userid─┬─downloads─┬───────bytes─┐
            2021-04-14 00:00:002235116904055296
            2021-04-15 00:00:0022144073164486383
            2021-04-16 00:00:0022144071056228447
            2021-04-17 00:00:0022144071659611401
            2021-04-18 00:00:002232916360129101
            └─────────────────────┴────────┴───────────┴─────────────┘
            ┌─────────────────day─┬─userid─┬─downloads─┬───────bytes─┐
            2021-04-11 00:00:0025102550765199966
            2021-04-11 00:00:0025104851955725222
            2021-04-12 00:00:0025144073344565998
            2021-04-12 00:00:0025144071811987127
            2021-04-13 00:00:0025144071120216456
            2021-04-13 00:00:0025144071116881712
            2021-04-14 00:00:0025109554098241780
            2021-04-14 00:00:0025107254787389274
            └─────────────────────┴────────┴───────────┴─────────────
            复制

            使用物化视图盲区:

            A materialized view is implemented as follows: when inserting data to the table specified in SELECT
            , part of the inserted data is converted by this SELECT
             query, and the result is inserted in the view.

            https://clickhouse.tech/docs/en/sql-reference/statements/create/view/#materialized

            Important

            Materialized views in ClickHouse are implemented more like insert triggers. If there’s some aggregation in the view query, it’s applied only to the batch of freshly inserted data. Any changes to existing data of source table (like update, delete, drop partition, etc.) doesn’t change the materialized view.

            https://clickhouse.tech/docs/en/sql-reference/statements/create/view/#materialized

                    当向SELECT中指定的表插入数据时,部分插入的数据会被这个SELECT查询转换,结果会插入到视图中。        

                    物化视图本质就像insert语句的触发器;如果有什么集合的运算,他会应用于最新插入的数据当中;对于其他原表的变化,比如说,更新,删除,删除分区,都不会影响到物化视图的变化。

            SELECT
             query can contain DISTINCT
            GROUP BY
            ORDER BY
            LIMIT
            … Note that the corresponding conversions are performed independently on each block of inserted data. For example, if GROUP BY
             is set, data is aggregated during insertion, but only within a single packet of inserted data. The data won’t be further aggregated. 
            https://clickhouse.tech/docs/en/sql-reference/statements/create/view/#materialized

                    SELECT查询可以包含DISTINCT, GROUP BY, ORDER BY, LIMIT等等,特别注意这些相关联的约束只能应用于每个新插入的数据块中;比如说,如果设置了group by ,这些语句只会应用于新插入的的数据当中,不会作用于已经插入的分区当中。

            上一篇:MySQL pt工具包集合(一)

            近期文章推荐:

            ClickHouse 你不知道的盲区

            Clickhouse - System Tables 集合(一)

            Clickhouse - System Tables 集合(二)

            更多精彩内容欢迎关注微信公众号

            文章转载自ClickHouse周边,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

            评论