Vertica使用partitions表对分区大小进行汇总出现差异的原因
使用partitions表对分区表的分区表数量进行汇总,部分分区表汇总结果和projection_storage结果差异巨大
table_name|acc_sum_MB| p_sum_MB |a
----------+----------+----------+----------------
s1.T1 |119,158,775 |1,768,641,038 |-1,649,482,263
s1.T2 |105,580,146 |105,580,146 |0
s1.T3 |43,350,583 |43,350,583 |0
s1.T4 |21,088 |5,032,638 |-5,011,550
原因在于partitions中的ros_id指定的ros中可能存放有多个分区的数据,在按分区汇总分析的过程中重复计算了大量的ros大小。
- 正常情况下每个分区的数据一定是存放在不同的ROS中,但如果对该表使用mergeoutsingleros函数合并过文件,则会出现不同分区数据在一个ros的情况。
下面这个表既是如此:
dbadmin=> select ros_id,count(distinct partition_key) from partitions where table_schema = ‘s1’ and projection_name ilike ‘T1_b%’ group by 1 order by 2 desc limit 10
dbadmin-> ;
ros_id | count
--------------------±------
45035998330997143 | 124
49539597958367061 | 124
54043197585737455 | 124
58546797213107695 | 124
63050396840478331 | 124
76561195722589955 | 124
81064795349960353 | 124
90071994604701407 | 124
99079193859442525 | 124
108086393114183323 | 124
(10 rows)
这个表的一个ros_id只有一个分区的数据,所以汇总出来的结果和projection_storage数据一致
dbadmin=> select ros_id,count(distinct partition_key) from partitions where table_schema = ‘s1’ and projection_name ilike ‘T1_b%’ group by 1 order by 2 desc limit 10;
ros_id | count
-------------------±------
49539597991809079 | 1
72057596095218277 | 1
72057596206148613 | 1
81064795614503925 | 1
85568395111826421 | 1
90071994604700539 | 1
94575594232070745 | 1
94575594232070765 | 1
94575594232071131 | 1
94575594232071157 | 1
(10 rows)
可以通过 alter table … reorganize; 分开再汇总。
alter table s1.t1 reorganize;
NOTICE 4785: Started background repartition table task
ALTER TABLE
dbadmin=>
dbadmin=> select ros_id,count(distinct partition_key) from partitions where table_schema = ‘s1’ and projection_name ilike ‘T1_b%’ group by 1 order by 2 desc limit 10;
ros_id | count
-------------------±------
45035998686463575 | 240
58546797569190371 | 240
63050397177898613 | 240
67553996804240455 | 240
81064795686208547 | 240
81064795686208721 | 240
90071994946351151 | 240
90071994946351323 | 240
94575594570276897 | 240
94575594570277079 | 240
(10 rows)
dbadmin=> select ros_id,count(distinct partition_key) from partitions where table_schema = ‘s1’ and projection_name ilike ‘T1_b%’ group by 1 order by 2 desc limit 10;
ros_id | count
-------------------±------
45035998711548941 | 1
54043197942002701 | 1
58546797589367811 | 1
58546797595422147 | 1
63050397202424841 | 1
67553996828519431 | 1
67553996828549109 | 1
72057596451331077 | 1
76561196059593735 | 1
76561196079090683 | 1
(10 rows)
select foo.table_name,foo.sum_MB as acc_sum_MB,lish.sum_MB as p_sum_MB from
(select ps.anchor_table_schema||’.’||ps.anchor_table_name table_name,sum(ps.used_bytes)//1024^2 sum_MB from projection_storage ps group by 1) foo
left join
(SELECT
prj.projection_schema||’.’||prj.anchor_table_name table_name,
SUM(par.ros_size_bytes)//1024^2 sum_MB
FROM partitions par JOIN projections prj USING (projection_id) GROUP BY 1
) lish
on foo.table_name = lish.table_name
WHERE foo.table_name = ‘s1.T1’
order by 2 desc;
table_name | acc_sum_MB | p_sum_MB
-------------------------------±-----------±---------
s1.T1 | 20739 | 20739
(1 row)
最后修改时间:2024-08-29 11:32:18
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。