Whoami:5年+金融、政府、医疗领域工作经验的DBACertificate:PGCM、OCP、YCPSkill:Oracle、Mysql、PostgreSQL、国产数据库Platform:CSDN、墨天轮、公众号(呆呆的私房菜)
SaaS应用通常采用多租户架构,多个租户可以共享同一套软件实例和基础设施,每个租户的数据集中存储在单个数据库实例中,并与其他租户隔离。 这种设计具备统一维护、资源高效、运维简单等优势。
但是,传统单机关系型数据库在应对大规模多租户应用的数据量时存在扩展瓶颈,当数据超出单节点容量时,关系模型的优势就无法体现出来了。 Citus通过分布式集群技术、实现让多租户应用像连接单机PostgreSQL一样工作,同时支持横向扩展。
一、场景构建: 构建一个广告效果追踪和数据分析平台,其核心功能是为每个企业(租户)提供独立的分析看板。数据模型如下:
-- 企业表(租户主体)CREATE TABLE companies (id bigserial PRIMARY KEY,name text NOT NULL,image_url text,created_at timestamp without time zone NOT NULL,updated_at timestamp without time zone NOT NULL);-- 广告活动表(属于企业)CREATE TABLE campaigns (id bigserial PRIMARY KEY,company_id bigint REFERENCES companies (id),name text NOT NULL,cost_model text NOT NULL,state text NOT NULL,monthly_budget bigint,blacklisted_site_urls text[],created_at timestamp without time zone NOT NULL,updated_at timestamp without time zone NOT NULL);-- 广告表(属于活动)CREATE TABLE ads (id bigserial PRIMARY KEY,campaign_id bigint REFERENCES campaigns (id),name text NOT NULL,image_url text,target_url text,impressions_count bigint DEFAULT 0,clicks_count bigint DEFAULT 0,created_at timestamp without time zone NOT NULL,updated_at timestamp without time zone NOT NULL);-- 事件点击表CREATE TABLE clicks (id bigserial PRIMARY KEY,ad_id bigint REFERENCES ads (id),clicked_at timestamp without time zone NOT NULL,site_url text NOT NULL,cost_per_click_usd numeric(20,10),user_ip inet NOT NULL,user_data jsonb NOT NULL);-- 展示事件表CREATE TABLE impressions (id bigserial PRIMARY KEY,ad_id bigint REFERENCES ads (id),seen_at timestamp without time zone NOT NULL,site_url text NOT NULL,cost_per_impression_usd numeric(20,10),user_ip inet NOT NULL,user_data jsonb NOT NULL);
二、分布式改造 1. 定义分布列 核心原则:所有与租户相关的表必须包含company_id字段,并基于此字段分片。
Citus会将相同company_id的数据存储在同一个物理节点;确保跨表查询(join)无需跨界点通信,性能接近单机数据库
2. 修改主键与外键
原始问题:单机数据库使用id作为主键;但分布式环境下需要用复合主键(company_id, id)-- 企业表(租户主体)CREATE TABLE companies (id bigserial PRIMARY KEY,name text NOT NULL,image_url text,created_at timestamp without time zone NOT NULL,updated_at timestamp without time zone NOT NULL);-- 广告活动表(分布式改造)CREATE TABLE campaigns (id bigserial, -- 移除单字段主键company_id bigint REFERENCES companies (id),name text NOT NULL,cost_model text NOT NULL,state text NOT NULL,monthly_budget bigint,blacklisted_site_urls text[],created_at timestamp without time zone NOT NULL,updated_at timestamp without time zone NOT NULL,PRIMARY KEY (company_id, id) -- 定义复合主键);-- 广告表(属于活动)CREATE TABLE ads (id bigserial, -- 移除单字段主键company_id bigint, -- 新增租户ID字段campaign_id bigint, -- 移除外键 campaigns(id)name text NOT NULL,image_url text,target_url text,impressions_count bigint DEFAULT 0,clicks_count bigint DEFAULT 0,created_at timestamp without time zone NOT NULL,updated_at timestamp without time zone NOT NULL,PRIMARY KEY (company_id, id), -- 新增复合主键FOREIGN KEY (company_id, campaign_id) -- 新增级联外键REFERENCES campaigns (company_id, id));-- 点击事件表CREATE TABLE clicks (id bigserial, -- 移除单字段主键company_id bigint, -- 新增租户id字段ad_id bigint, -- 移除外键 ads(id),clicked_at timestamp without time zone NOT NULL,site_url text NOT NULL,cost_per_click_usd numeric(20,10),user_ip inet NOT NULL,user_data jsonb NOT NULL,PRIMARY KEY (company_id, id), -- 新增复合主键FOREIGN KEY (company_id, ad_id) -- 新增级联外键REFERENCES ads (company_id, id));-- 展示事件表CREATE TABLE impressions (id bigserial, -- 移除单字段主键company_id bigint, -- 新增租户id字段ad_id bigint, -- 移除外键 ads (id),seen_at timestamp without time zone NOT NULL,site_url text NOT NULL,cost_per_impression_usd numeric(20,10),user_ip inet NOT NULL,user_data jsonb NOT NULL,PRIMARY KEY (company_id, id), -- 新增复合主键FOREIGN KEY (company_id, ad_id) -- 新增级联外键REFERENCES ads (company_id, id));
3. 创建分布式表
SELECT create_distributed_table('companies', 'id');SELECT create_distributed_table('campaigns', 'company_id');SELECT create_distributed_table('ads', 'company_id');SELECT create_distributed_table('clicks', 'company_id');SELECT create_distributed_table('impressions', 'company_id');
4. 数据导入
-- Citus 官网提供了测试数据样例# 下载数据集for dataset in companies campaigns ads clicks impressions geo_ips; docurl -O https://examples.citusdata.com/mt_ref_arch/${dataset}.csvdone# 使用COPY命令导入(在psql中执行)\copy companies FROM 'companies.csv' WITH CSV\copy campaigns FROM 'campaigns.csv' WITH CSV\copy ads FROM 'ads.csv' WITH CSV\copy clicks FROM 'clicks.csv' WITH CSV\copy impressions FROM 'impressions.csv' WITH CSV
5. 查询优化 查询必须包含租户id,citus才能正确路由到分片
示例:-- 查询租户5的广告活动SELECT * FROM campaignsWHERE company_id = 5AND state = 'active';-- 统计租户5某广告的点击量SELECT COUNT(*)FROM clicksWHERE company_id = 5AND ad_id = 123;-- 缺少company_id,触发全分片扫描SELECT * FROM campaignsWHERE name = '黑五促销';
分布式join优化,涉及相同租户时,citus在节点内直接处理
-- 高效JOIN:同一租户的活动和广告SELECT c.name AS campaign_name, a.name AS ad_nameFROM campaigns cJOIN ads a ON c.company_id = a.company_idAND c.id = a.campaign_idWHERE c.company_id = 5;
6. 性能分析工具
-- 查看分片分布情况SELECT shardid, nodename, shard_sizeFROM citus_shardsWHERE table_name = 'clicks';-- 分析查询计划EXPLAINSELECT * FROM clicksWHERE company_id = 5AND clicked_at > NOW() - INTERVAL '1 day';期望输出:Custom Scan (Citus Adaptive)-> Distributed QueryTask Count: 1 -- 仅访问一个分片Tasks Shown: All-> Task: Query shardid=102030SQL: SELECT * FROM clicks_102030WHERE (company_id = 5 AND clicked_at > $1)
7. 运维优化 动态扩容:添加新节点后,重新平衡分片
SELECT rebalance_table_shards('clicks');
超大租户隔离:将数据量极大的租户分配到专用分片
SELECT isolate_tenant_to_new_shard('clicks', 5); -- 租户5单独分片
索引优化:
-- 创建全局索引需要包含租户idCREATE UNIQUE INDEX valid_idx ON ads (company_id, name);
数据清理:利用分片过滤条件删除旧数据
DELETE FROM clicksWHERE company_id = 5AND clicked_at < NOW() - INTERVAL '1 year';
本文内容就到这啦,相信读完本文可以给你带来citus实战的一些技巧~希望本篇内容能给你带来帮助。我们下篇再见!

文章转载自呆呆的私房菜,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




