
at the beginning of 2000 [
43
]. Many techniques have been proposed
over these years in both academia [
10
,
28
,
33
,
42
,
45
,
55
,
57
] and in-
dustry [
2
,
3
,
11
,
12
,
16
,
18
]. In particular, these system architectures
can be classied into three categories: AI-centric, UDF-centric, and
relation-centric. Zhou et al. [
57
] proposed a novel RDBMS by seam-
lessly integrating these three architectures. However, none of these
existing solutions can natively process the analytical query in the
above example. The core reason is that the result accuracy of the AI
models is ignored among them as they assume the used AI models
are given and well-trained. For example, during the above query
processing, none of them take the accuracy of dierent dunking
action recognition models into account.
The nsDB vision. To overcome the limitations of existing solu-
tions, in this work, we envision a novel type of neuro-symbolic
database system nsDB to process these new emerged queries. It in-
tegrates neural with symbolic systems to address the weaknesses of
each, providing a strong database capable of data managing, model
learning, complex and multi-model analytical query processing.
Specically, nsDB abstracts away the complexities of AI models,
and allows end users to build AI projects and use them for their
individual upstream applications, even they are without any code
skills, AI expertise and system developing experiences. To achieve
that, the neuro system is abstracted as a native-supported module
in nsDB, and the result accuracy and processing latency are con-
sidered simultaneously during query optimization. However, it is
not trivial to achieve the above goal as the implicit property of the
neuro system compromises accuracy and performance inherently.
For example, the more accurate of the dunk action detection model,
the higher the model inference latency.
The rest of the paper is organized as follows. We briey analyze
the unique aspects of nsDB within the context of extensive ongo-
ing work in Section 2. In Section 3, we rst introduce the system
architecture of nsDB, then highlight the research challenges of
each component, last present our design paradigms and prelimi-
nary ideas to address them. We discuss the generality of nsDB in
Section 4 and conclude this vision paper in Section 5.
2 RELATED WORK
In this section, we dierentiate our proposal nsDB from the most
relevant systems and techniques in the literature.
Neuro-Symbolic database system. Numerous researches [
2
,
3
,
10
–
12
,
16
,
18
,
28
,
28
,
33
,
39
,
42
,
44
,
45
,
55
,
57
] have been studied to
integrate DB and AI workloads in both academia and industry since
2000. The architecture of existing solutions can be classied into
three representative categories: (i) AI-centric, (ii) UDF-centric, and
(iii) relation-centric. To overcome the limitation of the solutions in
each category, a mixed solution was proposed [
57
] which integrates
the above three architecture categories. All these solutions (includ-
ing of our nsDB) provide AI model inferences for various analytical
tasks. However, none of the existing solutions have emerged as the
de-facto standard until now. The major reasons can be summarized
by three aspects: (i) model training, (ii) performance goal, and (iii)
optimization strategy, as shown in Table 1.
(I) Model training: Almost all existing AI and DB integrated sys-
tems assume the underlying AI models are well-trained and the
Table 1: Comparison of AI and DB integration solutions
Architecture Model Performance Optimization
category training goal strategy
AI-centric [33] Yes Latency-only Symbolic
UDF-centric [35] No Latency-only Symbolic
Rel.-centric [55] No Latency-only Symbolic
Mixed sol. [57] No Latency-only Neuro-symbolic
Our nsDB Yes Latency-accuracy Neuro-symbolic
integrated systems are designed for ecient model inferences. How-
ever, the fact is that model training cannot be ignored in real-world
applications. To make the matter worse, model training is not trivial
to support by the above integrated systems as they are designed to
provide excellent model inferring performance. Existing systems in
AI-centric category (e.g., Google Big Query [
11
], Amazon Redshift
ML [
3
]) train these models by ooading to the underlying DL sys-
tems (e.g., PyTorch, Tensorow). Obviously, it is not ecient as the
training data should be pre-prepared and it relies on other systems.
(II) Performance goal: The performance goal of existing AI and
DB integrated systems is only the query processing latency as the
underlying AI models are well-trained, which means the accuracy
of these AI models are xed. However, the same task can be pro-
cessed by multiple AI models. Moreover, dierent models have dif-
ferent result accuracy for the same task. For example, ArcFace [
29
],
FaceNet [
49
], and EigenFace [
52
] are the typical models for face
recognition task, and the result accuracy of them are dierent.
(III) Optimization strategy: Existing systems [
2
,
3
,
10
–
12
,
16
,
18
]
with AI-centric architecture ooad the inference computation to
the decoupled AI runtimes. Thus, their query optimizer only use
symbolic rules to process the predicates in the analytical query
and ignore the optimization of the complex neuro-based computa-
tions. The UDF-centric systems [
28
,
35
,
39
] use UDF to model the
neuro-based computations, and apply the symbolic-based optimiza-
tion strategies on the UDF-based logical plan. The relation-centric
systems [
42
,
55
] employ the relations to represent the model param-
eter tensor and extend the traditional relational algebra to tensor
relation algebra and optimize them in a holistic symbolic manner.
A simple co-optimization idea (i.e., devising novel query trans-
formation rules) of symbolic and neuro operators in the complex
analytical queries has been proposed in [
57
]. However, it cannot
achieve low latency and high accuracy goal simultaneously.
In this work, we envision the next generation database system
nsDB, which provides in-database model training, and a novel
neuro-symbolic query optimizer is devised in it to co-optimize the
performance latency and result accuracy of the complex analytical
query processing. The last row of Table 1 shows the unique aspects
of nsDB w.r.t. the existing DB and AI integrated systems.
Query processing over multi-modal data. Conducting complex
query over multi-modal data is an active research topic [
24
,
30
,
32
]
in database community in recent years. The general idea of them
to process complex query on multi-modal data is decomposing
the query into several subqueries and executing them to dier-
ent systems [
24
,
32
]. Our nsDB diers from them in two ways:
(i) it integrates both symbolic and neural operators to process
the dierent tasks in the complex query over multi-modal data,
3284
评论