暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

doris实现数据聚合的三种方式–多明细聚合、物化视图与rollup

原创 途er 2022-12-15
2354
  • A+

所属分类:Doris flink 数据仓库


doris实现数据聚合的三种方式

假设以
①ds、hour为维度计算pv
②ds为维度计算pv

1、三种聚合方式
(1)聚合模型+数据源多次写入

flink写入kafka代码

  1. insert into log_exp_pv
  2. select ds
  3. hour,
  4. device_id
  5. from
  6. dwd_kafka_log
  7. ;
  8. insert into log_exp_pv
  9. select ds
  10. 'ALL' hour,
  11. device_id
  12. from
  13. dwd_kafka_log
  14. ;

doris建表语句

  1. create table log_exp_pv(
  2. ds date,
  3. hour varchar(1000),
  4. pv bigint sum DEFAULT '0'
  5. )
  6. AGGREGATE KEY(ds,hour)
  7. partition by range(ds)
  8. (start('20220514') end ('20220520') every (INTERVAL 1 day))
  9. distributed by hash(platform) buckets 32
  10. PROPERTIES(
  11. "dynamic_partition.enable" = "true",
  12. "dynamic_partition.time_unit" = "DAY",
  13. "dynamic_partition.start" = "-1",
  14. "dynamic_partition.end" = "7",
  15. "dynamic_partition.prefix" = "p",
  16. "dynamic_partition.buckets" = "32",
  17. "replication_num" = "1"
  18. )
  19. ;

routine load 从kafka导入数据到doris

  1. CREATE ROUTINE LOAD routine_load_log_exp_pv ON log_exp_pv
  2. COLUMNS TERMINATED BY ",",
  3. COLUMNS (
  4. ds
  5. ,hour
  6. ,pv = 1
  7. )
  8. PROPERTIES
  9. (
  10. "desired_concurrent_number"="3",
  11. "max_error_number"="0",
  12. "strict_mode" = "false",
  13. "format" = "json"
  14. )
  15. FROM KAFKA
  16. (
  17. "kafka_broker_list"= "",
  18. "kafka_topic" = "log_exp_pv",
  19. "property.group.id"="routine_load_log_exp_pv"
  20. );

(2)明细模型 + 物化视图
flink写入kafka代码

  1. insert into log_exp_pv
  2. select ds
  3. hour,
  4. device_id
  5. from
  6. dwd_kafka_log
  7. ;

doris建表语句

  1. create table log_exp_detail(
  2. ds date,
  3. hour varchar(1000),
  4. user_id bigint
  5. )
  6. duplicate key(ds,hour)
  7. partition by range(ds)
  8. (start('20220512') end ('20220520') every (INTERVAL 1 day))
  9. distributed by hash(channel) buckets 32
  10. PROPERTIES(
  11. "dynamic_partition.enable" = "true",
  12. "dynamic_partition.time_unit" = "DAY",
  13. "dynamic_partition.start" = "-1",
  14. "dynamic_partition.end" = "7",
  15. "dynamic_partition.prefix" = "p",
  16. "dynamic_partition.buckets" = "32",
  17. "replication_num" = "1"
  18. );

routine load 从kafka导入数据到doris

  1. CREATE ROUTINE LOAD routine_load_log_exp_detail ON log_exp_detail
  2. COLUMNS TERMINATED BY ",",
  3. COLUMNS (
  4. ds
  5. ,hour
  6. ,user_id
  7. )
  8. PROPERTIES
  9. (
  10. "desired_concurrent_number"="3",
  11. "max_error_number"="0",
  12. "strict_mode" = "false",
  13. "format" = "json"
  14. )
  15. FROM KAFKA
  16. (
  17. "kafka_broker_list"= "",
  18. "kafka_topic" = "log_exp_detail",
  19. "property.group.id"="routine_load_log_exp_detail"
  20. );

创建物化视图

  1. CREATE MATERIALIZED VIEW mv_ds AS
  2. SELECT ds
  3. ,count(user_id) pv
  4. FROM log_exp_detail
  5. GROUP BY ds;
  1. CREATE MATERIALIZED VIEW mv_ds_hour AS
  2. SELECT ds
  3. ,hour
  4. ,count(user_id) pv
  5. FROM log_exp_detail
  6. GROUP BY ds
  7. ,hour;

查询计划

  1. explain
  2. SELECT ds
  3. ,count(user_id) pv
  4. FROM log_job_exp_detail
  5. GROUP BY ds;

doris实现数据聚合的三种方式--多明细聚合、物化视图与rollup
(3)聚合模型 + rollup
doris建表语句

  1. create table log_exp_pv(
  2. ds date,
  3. hour varchar(1000),
  4. pv bigint sum DEFAULT '0'
  5. )
  6. AGGREGATE KEY(ds,hour)
  7. partition by range(ds)
  8. (start('20220514') end ('20220520') every (INTERVAL 1 day))
  9. distributed by hash(platform) buckets 32
  10. rollup (
  11. r1(ds)
  12. ,r2(ds,hour)
  13. )
  14. PROPERTIES(
  15. "dynamic_partition.enable" = "true",
  16. "dynamic_partition.time_unit" = "DAY",
  17. "dynamic_partition.start" = "-1",
  18. "dynamic_partition.end" = "7",
  19. "dynamic_partition.prefix" = "p",
  20. "dynamic_partition.buckets" = "32",
  21. "replication_num" = "1"
  22. )
  23. ;

routine load 从kafka导入数据到doris

  1. CREATE ROUTINE LOAD routine_load_log_exp_pv ON log_exp_pv
  2. COLUMNS TERMINATED BY ",",
  3. COLUMNS (
  4. ds
  5. ,hour
  6. ,user_id
  7. )
  8. PROPERTIES
  9. (
  10. "desired_concurrent_number"="3",
  11. "max_error_number"="0",
  12. "strict_mode" = "false",
  13. "format" = "json"
  14. )
  15. FROM KAFKA
  16. (
  17. "kafka_broker_list"= "",
  18. "kafka_topic" = "log_exp_pv",
  19. "property.group.id"="routine_load_log_exp_pv"
  20. );

查询计划

  1. explain
  2. SELECT ds
  3. ,count(user_id) pv
  4. FROM log_exp_pv
  5. GROUP BY ds;

doris实现数据聚合的三种方式--多明细聚合、物化视图与rollup
2、三种导入方式对比

(1)聚合模型+数据源多次写入
优点:
①只需关心聚合模型即可
②存储为聚合模型,降低了数据量
缺点:
①假设维度为n,则数据量会膨胀2^n倍,导致数据导入压力增大
②维度过多时,明细数据开发复杂度增加
(2)明细模型 + 物化视图
优点:
①明细数据只需一份
②明细模型保留原始数据
缺点:
①存储为明细数据,导致存储数据量较大
②多维度时,物化视图需要建立多次
③物化视图语法有严格限制
(3)聚合模型 + rollup
优点:
①存储为聚合模型,降低了数据量
②建表时即可指定rollup,降低了开发复杂性
缺点:
①明细数据无法保留

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论