暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
【北理工袁野、深算院秦建斌等2024VLDB】nsDB Architecting the Next Generation Database by Integrating Neural and Symbolic Systems.pdf
132
7页
1次
2025-04-17
免费下载
nsDB: Architecting the Next Generation Database by Integrating
Neural and Symbolic Systems
Ye Yuan
Beijing Institute of Technology
Beijing, China
yuan-ye@bit.edu.cn
Bo Tang
Southern Univ. of Sci. and Tech.
Shenzhen, China
tangb3@sustech.edu.cn
Tianfei Zhou
Beijing Institute of Technology
Beijing, China
ztfei.debug@gmail.com
Zhiwei Zhang
Beijing Institute of Technology
Beijing, China
zwzhang@bit.edu.cn
Jianbin Qin
Shenzhen University
Shenzhen, China
qinjianbin@szu.edu.cn
ABSTRACT
In this paper, we propose nsDB, a novel neuro-symbolic database
system that integrates neural and symbolic system architectures
natively to address the weaknesses of each, providing a strong
database capable of data managing, model learning, and complex
analytical query processing over multi-modal data. We employ a
real-world NBA data analytical query as an example to illustrate
the functionality of each component in nsDB and highlight the
research challenges to build it. We then present the key design
principles and our preliminary attempts to address them.
In a nutshell, we envision that the next generation database
system nsDB integrates the complex neural system with the simple
symbolic system. Undoubtedly, nsDB will serve as a bridge between
databases with AI models, which abstracts away the AI complexities
but allows end users to enjoy the strong capabilities of them. We
are in the early stages of the journey to build nsDB, there are many
opening challenges, e.g., in-database model training, multi-objective
query optimization, and database agent development. We hope the
researchers from dierent communities (e.g., system, architecture,
database, articial intelligence) could tackle them together.
PVLDB Reference Format:
Ye Yuan, Bo Tang, Tianfei Zhou, Zhiwei Zhang, and Jianbin Qin. nsDB:
Architecting the Next Generation Database by Integrating Neural and
Symbolic Systems. PVLDB, 17(11): 3283 - 3289, 2024.
doi:10.14778/3681954.3682000
1 INTRODUCTION
On one hand, either traditional relational database systems (e.g.,
PostgreSQL [
17
], MySQL [
15
]) or modern big data systems (e.g.,
Spark [
7
], Flink [
5
], Hive [
6
]) employs symbolic system (a.k.a. al-
gebraic computation [
19
]) as the building brick in the system ar-
chitecture. In particular, the complex data processing procedure
in them is transferred to exact computation with expressions con-
taining variables and are manipulated as symbols, i.e., relational
This work is licensed under the Creative Commons BY-NC-ND 4.0 International
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights
licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 17, No. 11 ISSN 2150-8097.
doi:10.14778/3681954.3682000
Find clips of LeBron James dunking
from the Los Angeles Lakers' regular
season videos where he scored at
least 30 points in those games.
NBA statistics table S NBA game video V
query result
Figure 1: User query in video database
algebra. The major advantage of symbolic system is that it provides
exact computation and its computation procedure is step-by-step
and explicit. On the other hand, both machine learning and deep
learning in the eld of articial intelligence utilize mathematical
models (a.k.a neuro system [
37
]) to learn from data and generalize
to unseen data, and thus perform tasks without explicit instructions.
The representative mathematical models are statistical algorithms
and articial neural networks. In recent years, the neuro system
brings huge attention as its success in natural language processing
(NLP), computer vision (CV), speech recognition, etc. The most im-
portant properties of neuro system are intuitive and unconscious.
In recent years, many applications in various domains [
25
,
38
,
57
]
have been emerged, which cannot be eciently processed by either
symbolic-based data management system or neuro-based articial
intelligence system independently.
Example. Considering the illustrated example in Figure 1, the data
analysts in NBA marketing team want to advertise NBA all-star
game by promoting the NBA super star “Lebron James” [
1
]. Hence,
they want to nd the clips from the NBA data repository such
that Lebron James is dunking in these games when his team is
Los Angeles Lakers” and he scored at least 30 points. Inherently,
it is not trivial to answer by either symbolic-based databases or
neuro-based AI systems as it includes two fundamental tasks: (i)
identify the specic frames from video database, i.e., the frames
Lebron James is dunking; and (ii) nding all these frames in a large
video database with attribute constraints, e.g., scored at least 30
points and in Los Angeles Lakers.
A straight forward idea to address the above query is combining
the abilities of both symbolic system and neuro system. In the liter-
ature, integrating ML tasks into database system has been studied
3283
at the beginning of 2000 [
43
]. Many techniques have been proposed
over these years in both academia [
10
,
28
,
33
,
42
,
45
,
55
,
57
] and in-
dustry [
2
,
3
,
11
,
12
,
16
,
18
]. In particular, these system architectures
can be classied into three categories: AI-centric, UDF-centric, and
relation-centric. Zhou et al. [
57
] proposed a novel RDBMS by seam-
lessly integrating these three architectures. However, none of these
existing solutions can natively process the analytical query in the
above example. The core reason is that the result accuracy of the AI
models is ignored among them as they assume the used AI models
are given and well-trained. For example, during the above query
processing, none of them take the accuracy of dierent dunking
action recognition models into account.
The nsDB vision. To overcome the limitations of existing solu-
tions, in this work, we envision a novel type of neuro-symbolic
database system nsDB to process these new emerged queries. It in-
tegrates neural with symbolic systems to address the weaknesses of
each, providing a strong database capable of data managing, model
learning, complex and multi-model analytical query processing.
Specically, nsDB abstracts away the complexities of AI models,
and allows end users to build AI projects and use them for their
individual upstream applications, even they are without any code
skills, AI expertise and system developing experiences. To achieve
that, the neuro system is abstracted as a native-supported module
in nsDB, and the result accuracy and processing latency are con-
sidered simultaneously during query optimization. However, it is
not trivial to achieve the above goal as the implicit property of the
neuro system compromises accuracy and performance inherently.
For example, the more accurate of the dunk action detection model,
the higher the model inference latency.
The rest of the paper is organized as follows. We briey analyze
the unique aspects of nsDB within the context of extensive ongo-
ing work in Section 2. In Section 3, we rst introduce the system
architecture of nsDB, then highlight the research challenges of
each component, last present our design paradigms and prelimi-
nary ideas to address them. We discuss the generality of nsDB in
Section 4 and conclude this vision paper in Section 5.
2 RELATED WORK
In this section, we dierentiate our proposal nsDB from the most
relevant systems and techniques in the literature.
Neuro-Symbolic database system. Numerous researches [
2
,
3
,
10
12
,
16
,
18
,
28
,
28
,
33
,
39
,
42
,
44
,
45
,
55
,
57
] have been studied to
integrate DB and AI workloads in both academia and industry since
2000. The architecture of existing solutions can be classied into
three representative categories: (i) AI-centric, (ii) UDF-centric, and
(iii) relation-centric. To overcome the limitation of the solutions in
each category, a mixed solution was proposed [
57
] which integrates
the above three architecture categories. All these solutions (includ-
ing of our nsDB) provide AI model inferences for various analytical
tasks. However, none of the existing solutions have emerged as the
de-facto standard until now. The major reasons can be summarized
by three aspects: (i) model training, (ii) performance goal, and (iii)
optimization strategy, as shown in Table 1.
(I) Model training: Almost all existing AI and DB integrated sys-
tems assume the underlying AI models are well-trained and the
Table 1: Comparison of AI and DB integration solutions
Architecture Model Performance Optimization
category training goal strategy
AI-centric [33] Yes Latency-only Symbolic
UDF-centric [35] No Latency-only Symbolic
Rel.-centric [55] No Latency-only Symbolic
Mixed sol. [57] No Latency-only Neuro-symbolic
Our nsDB Yes Latency-accuracy Neuro-symbolic
integrated systems are designed for ecient model inferences. How-
ever, the fact is that model training cannot be ignored in real-world
applications. To make the matter worse, model training is not trivial
to support by the above integrated systems as they are designed to
provide excellent model inferring performance. Existing systems in
AI-centric category (e.g., Google Big Query [
11
], Amazon Redshift
ML [
3
]) train these models by ooading to the underlying DL sys-
tems (e.g., PyTorch, Tensorow). Obviously, it is not ecient as the
training data should be pre-prepared and it relies on other systems.
(II) Performance goal: The performance goal of existing AI and
DB integrated systems is only the query processing latency as the
underlying AI models are well-trained, which means the accuracy
of these AI models are xed. However, the same task can be pro-
cessed by multiple AI models. Moreover, dierent models have dif-
ferent result accuracy for the same task. For example, ArcFace [
29
],
FaceNet [
49
], and EigenFace [
52
] are the typical models for face
recognition task, and the result accuracy of them are dierent.
(III) Optimization strategy: Existing systems [
2
,
3
,
10
12
,
16
,
18
]
with AI-centric architecture ooad the inference computation to
the decoupled AI runtimes. Thus, their query optimizer only use
symbolic rules to process the predicates in the analytical query
and ignore the optimization of the complex neuro-based computa-
tions. The UDF-centric systems [
28
,
35
,
39
] use UDF to model the
neuro-based computations, and apply the symbolic-based optimiza-
tion strategies on the UDF-based logical plan. The relation-centric
systems [
42
,
55
] employ the relations to represent the model param-
eter tensor and extend the traditional relational algebra to tensor
relation algebra and optimize them in a holistic symbolic manner.
A simple co-optimization idea (i.e., devising novel query trans-
formation rules) of symbolic and neuro operators in the complex
analytical queries has been proposed in [
57
]. However, it cannot
achieve low latency and high accuracy goal simultaneously.
In this work, we envision the next generation database system
nsDB, which provides in-database model training, and a novel
neuro-symbolic query optimizer is devised in it to co-optimize the
performance latency and result accuracy of the complex analytical
query processing. The last row of Table 1 shows the unique aspects
of nsDB w.r.t. the existing DB and AI integrated systems.
Query processing over multi-modal data. Conducting complex
query over multi-modal data is an active research topic [
24
,
30
,
32
]
in database community in recent years. The general idea of them
to process complex query on multi-modal data is decomposing
the query into several subqueries and executing them to dier-
ent systems [
24
,
32
]. Our nsDB diers from them in two ways:
(i) it integrates both symbolic and neural operators to process
the dierent tasks in the complex query over multi-modal data,
3284
of 7
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜