
219
Secure Sampling for Approximate Multi-party ery
Processing
QIYAO LUO, Hong Kong University of Science and Technology, Hong Kong SAR, China
YILEI WANG, Alibaba Group, Hangzhou, China
KE YI, Hong Kong University of Science and Technology, Hong Kong SAR, China
SHENG WANG, Alibaba Group, Hangzhou, China
FEIFEI LI, Alibaba Group, Hangzhou, China
We study the problem of random sampling in the secure multi-party computation (MPC) model. In MPC,
taking a sample securely must have a cost
Ω(𝑛)
irrespective to the sample size
𝑠
. This is in stark contrast with
the plaintext setting, where a sample can be taken in
𝑂 (𝑠)
time trivially. Thus, the goal of approximate query
processing (AQP) with sublinear costs seems unachievable under MPC. To get around this inherent barrier, in
this paper we take a two-stage approach: In the oine stage, we generate a batch of
𝑛/𝑠
samples with
˜
𝑂 (𝑛)
total cost, which can then be consumed to answer queries as they arrive online. Such an approach allows us to
achieve an
˜
𝑂 (𝑠)
amortized cost per query, similar to the plaintext setting. Based on our secure batch sampling
algorithms, we build MASQUE, an MPC-AQP system that achieves sublinear online query costs by running
an MPC protocol to evaluate the queries on pre-generated samples. MASQUE achieves the strong security
guarantee of the MPC model, i.e., nothing is revealed beyond the query result, which itself can be further
protected by (amplied) dierential privacy.
CCS Concepts: • Security and privacy
→
Database and storage security; • Information systems
→
Data management systems; • Mathematics of computing → Probability and statistics.
Additional Key Words and Phrases: Secure multi-party computation; Sampling; Approximate query processing
ACM Reference Format:
Qiyao Luo, Yilei Wang, Ke Yi, Sheng Wang, and Feifei Li. 2023. Secure Sampling for Approximate Multi-
party Query Processing. Proc. ACM Manag. Data 1, 3 (SIGMOD), Article 219 (September 2023), 27 pages.
https://doi.org/10.1145/3617339
1 INTRODUCTION
Data analysis often needs to be conducted across multiple organizations. Meanwhile, as data gets
increasingly valuable, there is also a growing need to protect its security and privacy.
Example 1.1. An insurance company wishes to estimate the budget it should prepare for certain
types of diseases in 2023. This information could be obtained by running a query on the joint
data from the insurance company and the hospitals. For example, several hospitals jointly hold
R(person, year, disease)
and the insurance company has relation
S(person, cost, rate)
. A simple
SQL query on these relations can be the following:
Authors’ addresses: Qiyao Luo, qluoak@cse.ust.hk, Hong Kong University of Science and Technology, Hong Kong SAR,
China; Yilei Wang, fengmi.wyl@alibaba-inc.com, Alibaba Group, Hangzhou, China; Ke Yi, yike@cse.ust.hk, Hong Kong
University of Science and Technology, Hong Kong SAR, China; Sheng Wang, sh.wang@alibaba-inc.com, Alibaba Group,
Hangzhou, China; Feifei Li, lifeifei@alibaba-inc.com, Alibaba Group, Hangzhou, China.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the
full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specic permission and/or a fee. Request permissions from permissions@acm.org.
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
2836-6573/2023/9-ART219 $15.00
https://doi.org/10.1145/3617339
Proc. ACM Manag. Data, Vol. 1, No. 3 (SIGMOD), Article 219. Publication date: September 2023.
相关文档
评论