错误如下:
ERROR: complex joins are only supported when all distributed tables are co-located and joined on their distribution columns
复制
创建测试表如下:
create table customers (customers_id bigint,product_name varchar(100));
alter table customers add primary key (customers_id);
create table orders (order_id bigint,customers_id bigint,order_detail varchar(100));
alter table orders add primary key(order_id,customers_id);
create table products (product_id bigint,customers_id bigint,product_name varchar(100));
alter table products add primary key (product_id);
select create_distributed_table('customers','customers_id');
select create_distributed_table('orders','order_id');
insert into customers
select n,md5(random()::text)
from generate_series(1,10000) n;
insert into orders
select n*3,n,md5(random()::text)
from generate_series(1,200000) n;
insert into products
select 10+n*2,n,md5(random()::text)
from generate_series(1,10000) n;
复制
上面测试表的整体逻辑就是让join列,分布在不同的work节点中
然后执行下面SQL
with products_local as (select * from products)
select c.* , o.*
from customers c , orders o
where o.customers_id = c.customers_id
and exists (select 1 from products_local p
where p.customers_id = c.customers_id);
ERROR: complex joins are only supported when all distributed tables are co-located and joined on their distribution columns
复制
即便是改成参考表,也不能解决上述错误,所以上述的问题还是数据分布上的问题,两个表之间没有采用共同的分片键
lightdb@test=# SELECT create_reference_table('products');
NOTICE: Copying data from local table...
NOTICE: copying the data has completed
DETAIL: The local data in the table is no longer visible, but is still on disk.
HINT: To remove the local data, run: SELECT truncate_local_data_after_distributing_table($$public.products$$)
create_reference_table
------------------------
(1 row)
lightdb@test=# with products_local as (select * from products)
select c.* , o.*
from customers c , orders o
where o.customers_id = c.customers_id
and exists (select 1 from products_local p
where p.customers_id = c.customers_id);
ERROR: complex joins are only supported when all distributed tables are co-located and joined on their distribution columns
lightdb@test=#
复制
原因
当你执行一个包含复杂联接操作的查询时,如果你得到这个错误消息,它通常意味着你在Citus中执行的查询涉及多个表,这些表没有在同一个节点上分布,或者它们被连接的列不是它们的分布列。
在Citus中,查询中的每个表都必须与其他表在相同的节点上进行分布。如果你正在连接的表不在同一个节点上分布,或者你正在连接的列不是它们的分布列,那么你将无法执行这个复杂联接操作。
为了解决这个问题,你需要重新设计你的表分布策略,以确保连接操作所涉及的所有表都在同一个节点上分布,并且连接的列是它们的分布列。另外,你也可以考虑使用其他分布式数据库工具,例如Apache Spark或Apache Cassandra,它们可能更适合你的需求。
真实环境中遇到要用CTE等价改写的SQL
例子1
DELETE FROM JYDB.MF_CalmarRatio A
USING JYDB.MF_CalmarRatio F
JOIN JYDB.tmp_MF_CalmarRatio B ON
F.InnerCode = B.InnerCode
AND F.EndDate = B.EndDate
AND F.IndexCode = B.IndexCode
LEFT JOIN JYDB.MF_CalmarRatio_TMP C ON
F.InnerCode = C.InnerCode
AND F.EndDate = C.EndDate
AND F.IndexCode = C.IndexCode
WHERE A.id = F.id AND C.InnerCode IS NULL;
复制
改为CTE
WITH cte AS (
SELECT F.id FROM
JYDB.MF_CalmarRatio F
JOIN JYDB.tmp_MF_CalmarRatio B ON
F.InnerCode = B.InnerCode
AND F.EndDate = B.EndDate
AND F.IndexCode = B.IndexCode
LEFT JOIN JYDB.MF_CalmarRatio_TMP C ON
F.InnerCode = C.InnerCode
AND F.EndDate = C.EndDate
AND F.IndexCode = C.IndexCode
WHERE
C.InnerCode IS NULL
)
DELETE FROM JYDB.MF_CalmarRatio A WHERE A.id IN ( SELECT id FROM cte);
复制
例子2
INSERT INTO JYDB.tmp_MF_StkHoldingAna
SELECT DISTINCT A.InnerCode, A.ReportDate
FROM (SELECT InnerCode, TradingDay FROM JYDB.QT_DailyQuote WHERE JSID > 730650253645 AND JSID <= 730650253645) B
JOIN MF_KeyStockPortfolio A ON A.StockInnerCode = B.InnerCode
WHERE B.TradingDay <= A.ReportDate
AND A.ReportDate < CASE WHEN (SELECT C.TradingDay FROM JYDB.QT_DailyQuote C
WHERE C.InnerCode = B.InnerCode
AND C.TradingDay > B.TradingDay
ORDER BY C.TradingDay LIMIT 1 OFFSET 0) IS NULL THEN CAST('9999-12-31' AS DATETIME)
ELSE (SELECT C.TradingDay
FROM JYDB.QT_DailyQuote C WHERE C.InnerCode = B.InnerCode AND C.TradingDay > B.TradingDay
ORDER BY C.TradingDay
LIMIT 1 OFFSET 0)
END
AND NOT EXISTS(
SELECT 1 FROM JYDB.tmp_MF_StkHoldingAna B
WHERE B.InnerCode = A.InnerCode
AND B.ReportDate = A.ReportDate);
WITH cte_QT_DailyQuote AS (
SELECT InnerCode, TradingDay FROM JYDB.QT_DailyQuote WHERE JSID > 730650253645 AND JSID <= 730650253645
),
cte_MF_KeyStockPortfolio AS (
SELECT StockInnerCode AS InnerCode, ReportDate FROM MF_KeyStockPortfolio
),
cte_Joined AS (
SELECT DISTINCT A.InnerCode, A.ReportDate FROM cte_MF_KeyStockPortfolio A
JOIN cte_QT_DailyQuote B ON A.InnerCode = B.InnerCode
WHERE B.TradingDay <= A.ReportDate
AND A.ReportDate < COALESCE((
SELECT C.TradingDay
FROM cte_QT_DailyQuote C
WHERE C.InnerCode = B.InnerCode
AND C.TradingDay > B.TradingDay
ORDER BY C.TradingDay LIMIT 1 OFFSET 0),
CAST('9999-12-31' AS DATETIME))
AND NOT EXISTS(SELECT 1 FROM JYDB.tmp_MF_StkHoldingAna D
WHERE D.InnerCode = A.InnerCode
AND D.ReportDate = A.ReportDate)
)
INSERT INTO JYDB.tmp_MF_StkHoldingAna SELECT * FROM cte_Joined;
复制
在这个重写版本中,我们使用了3个CTE:
- cte_QT_DailyQuote: 从JYDB.QT_DailyQuote表中检索符合特定条件的InnerCode和TradingDay,这个CTE对应之前SQL中的子查询B。
- cte_MF_KeyStockPortfolio: 从MF_KeyStockPortfolio表中检索StockInnerCode和ReportDate,并将StockInnerCode作为InnerCode返回,这个CTE对应之前SQL中的表MF_KeyStockPortfolio。
- cte_Joined: 通过将上面两个CTE连接起来来计算查询结果,以及通过子查询找到下一个交易日的TradingDay来计算CASE语句中的值。这个CTE对应之前SQL中的SELECT语句。
最后,在CTE的末尾,我们将查询结果插入到JYDB.tmp_MF_StkHoldingAna表中。
最后修改时间:2023-03-02 17:14:02
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
评论
TA的专栏
热门文章
tcp keepalive 保活机制相关参数和LightDB长连接中断问题
2023-03-15 3925浏览
硬件知识之PCIe(一)
2022-11-09 3430浏览
AMD Zen 3 Ryzen 深入评测:5950X、5900X、5800X 和 5600X 测试
2023-07-07 2919浏览
ERROR: ORA-24345: A Truncation or null fetch error occurred (DBD SUCCESS_WITH_INFO: OCIStmtFetch
2023-05-24 2515浏览
Oracle Blob常见函数
2023-03-22 2070浏览
目录