SIGMOD-Companion ’24, June 9–15, 2024, Santiago, AA, Chile Nicolas Bruno et al.
compute node failures or online changes of the cluster topology.
Additionally, global resource-aware scheduling and ne-grained
scale-out execution based on directed acyclic graphs provide extra
exibility and performance during query execution. This evolution
continues now with the introduction of Microsoft Fabric [
16
], an all-
in-one solution that covers data movement, data science, real-time
analytics, and business intelligence. Fabric oers various tools and
services, including Fabric Data Warehouse (or Fabric DW), which
is also based on Polaris and converges the world of data lakes and
warehouses. Fabric DW stores data in the Parquet le format [
9
]
and published as Delta Lake Logs [
7
], enabling ACID transactions
and cross-engine interoperability that can be leveraged through
other components such as Spark and Power BI [2].
As part of this architectural transformation and based on learn-
ings from a decade of experience with various distributed form
factors, the query processor of Fabric DW evolved as well. In this
paper we explore this evolution from the point of view of query
optimization. In Section 2 we review the compiler architecture of
the PDW system. In Section 3 we summarize some learnings from
deploying PDW and Synapse DW in production, as well as some
limitations we encountered as we evolved the overall architecture
of the system. In Section 4 we describe the new architecture of
Fabric DW from the point of view of compilation, and the changes
in query optimization required to evolve the compiler to this new
architecture. We report some initial results in Section 5, review
related work in Section 6, and conclude in Section 7.
2 PDW COMPILER ARCHITECTURE
We now summarize the architecture of the earlier PDW compiler
by describing the various steps involved in evaluating input queries
(see Figure 1 and refer to [20] for more details):
(1)
The input query is submitted to the PDW distributed en-
gine, where it is parsed, validated, and sent to the frontend
SQL Server Engine (which is collocated with the distributed
engine in the control node).
(2)
The SQL Server Frontend Engine maintains a shell database,
which denes all metadata and global statistics about tables
(whose data is actually partitioned on backend nodes) as
well as information about users and privileges. From the
compilation point of view, the shell database is indistinguish-
able from one that contains actual data. The query is then
parsed, algebrized, and transformed into a logical relational
tree, which is simplied and optimized. Optimization uses
transformation rules to explore the space of equivalent alter-
natives, costs these alternatives, and returns the one that is
expected to be the most ecient. Note that this optimization
does not take into account the distributed nature of the data,
but instead explores all centralized plans, as if all the data was
actually associated with that server. While the output of the
traditional SQL Server optimizer is the best execution plan,
the frontend SQL Server optimizer returns the whole search
space along with statistical information for each alternative.
This is done because simply parallelizing the best centralized
plan can result in suboptimal plans [
20
]. The search space
(or
MEMO
as discussed in Section 4.1) is serialized and sent
back to the distributed engine.
(3)
The distributed engine deserializes the
MEMO
and starts a sec-
ond stage of optimization using a distributed query optimizer,
or DQO. A bottom-up strategy similar to that of System-
R [
19
] enumerates distributed execution strategies by intro-
ducing appropriate data movement operators in the
MEMO
.
DQO considers distributions as interesting properties analo-
gous to sort orders in System-R, and thus reduces the search
space using the classical dynamic programming approach. It
then costs alternatives and obtains the distributed execution
plan that minimizes data movement. This distributed plan
is transformed into an executable format. The result of this
procedure is a linearized sequence of steps. Each step cor-
responds to an operator tree whose root and leaf nodes are
connected to other steps by data movement operators.
(4)
The distributed plan is sent to the scheduler/executor in
the distributed engine. At that point, steps are executed in
sequence. The execution subplan of each step is transformed
into
SQL
and sent to backend nodes, along with instructions
on how to move the resulting data across those nodes (e.g.,
shuing the result on a subset of columns).
(5)
Each backend node receives a standard
SQL
statement over
the data slice it owns, plus a data move directive on the result.
Parsing, algebrization and a new round of optimization are
done on the received query, and the resulting execution plan
is passed to the execution engine in the backend node.
(6)
The plan is executed in the backend node as if it was obtained
from a regular input query.
(7)
The result of execution is moved across backend nodes by
using temporary landing tables as the backing storage. Meta-
data information about each result is sent back to the execu-
tor in the distributed engine, so that subsequent steps are
executed correctly against temporary tables. For the last step,
the actual results are sent back to the distributed engine.
(8)
The nal results from all backend nodes are aggregated and
sent back to the client.
In short, the frontend engine maintains a shell database with
global statistical information on tables, and backend nodes maintain
a slice of the global database with actual data. Query compilation
involves three dierent optimization calls: the centralized optimiza-
tion in the frontend engine that produces the full logical search
space, the distributed optimization in the distributed engine that
minimizes data movement, and a third round of fragment optimiza-
tions in the backend nodes for each distributed step.
Example 2.1. Consider two tables
𝑇 (𝑇 𝑎,𝑇𝑏)
and
𝑆 (𝑆𝑎, 𝑆𝑏)
, both
distributed using a round-robin scheme, and the following query:
SELECT DISTINCT T.tb
FROM T INNER JOIN S ON T.Ta = S.Sa
WHERE S.Sb > 3
The query eventually reaches the frontend SQL Server engine
and is optimized (e.g., considering dierent join orders, optionally
pushing a partial distinct operator below the join) and the search
space is sent back to the PDW distributed engine, where various
distribution strategies are compared and the best distributed plan
is generated (see Figure 2(a)). In turn, Figure 2(b) shows the graph
associated with the best distributed plan, where each node (or step)
相关文档
评论