
Local Differential Privacy for Bayesian Optimization
Xingyu Zhou,
1∗
Jian Tan
2
1
ECE, The Ohio State University
2
Alibaba Group, Sunnyvale
zhou.2055@osu.edu, j.tan@alibaba-inc.com
Abstract
Motivated by the increasing concern about privacy in nowa-
days data-intensive online learning systems, we consider a
black-box optimization in the nonparametric Gaussian pro-
cess setting with local differential privacy (LDP) guarantee.
Specifically, the rewards from each user are further corrupted
to protect privacy and the learner only has access to the cor-
rupted rewards to minimize the regret. We first derive the re-
gret lower bounds for any LDP mechanism and any learn-
ing algorithm. Then, we present three almost optimal algo-
rithms based on the GP-UCB framework and Laplace DP
mechanism. In this process, we also propose a new Bayesian
optimization (BO) method (called MoMA-GP-UCB) based
on median-of-means techniques and kernel approximations,
which complements previous BO algorithms for heavy-tailed
payoffs with a reduced complexity. Further, empirical com-
parisons of different algorithms on both synthetic and real-
world datasets highlight the superior performance of MoMA-
GP-UCB in both private and non-private scenarios.
Introduction
We consider the problem of maximizing an unknown func-
tion f over a set D via sequentially querying it and received
only bandit feedback, i.e., when we query at x, we observe
a possibly noisy evaluation of f (x). This model has been
a main focus in machine learning research, e.g., the classic
multi-armed bandit (MAB) setting (Lai and Robbins 1985),
linear bandit setting (Abbasi-Yadkori, P
´
al, and Szepesv
´
ari
2011) and the general Bayesian optimization (Shahriari et al.
2015), with each one generalizing the previous one. It also
finds broad applications in many real-world systems, includ-
ing medical experiments, online shopping websites and rec-
ommender systems (Li et al. 2010). These systems adap-
tively make a decision and receive rewards (feedback) from
the user to simultaneously learn insightful facts and maxi-
mize the profit.
Recently, privacy has become a key issue in the above
mentioned online learning systems. Users have become in-
creasingly concerned about directly sharing their online in-
formation or activities to these systems, since these activities
∗
This work was done during the internship of the first author at
Alibaba Group, Sunnyvale, USA.
Copyright
c
2021, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
may reveal their private information. For example, a cus-
tomer of an online shopping website is not willing to tell the
website that he or she has purchased medicines for mental is-
sues. Another example is the medical experiments in which
the patient could reject to share the actual effects of the treat-
ment due to privacy concerns. This stimulates the need to
have a mechanism that further corrupts the feedbacks from
each user to protect privacy, which exactly fits the locally
differential private (LDP) model (Kasiviswanathan et al.
2011; Duchi, Jordan, and Wainwright 2013).
In contrast to the standard differential privacy
model (Dwork, Roth et al. 2014), in which the learner
collects the true data while releasing a private output to
protect privacy, in the LDP model, the learner only has
access to corrupted input data from the users. Hence, LDP
often provides a much stronger privacy protection for the
user and is more appealing in real applications, especially
for the systems mentioned above (Cormode et al. 2018). To
the best of our knowledge, in the setting of online learning
with bandit feedback, LDP model has only been studied
theoretically very recently. For example, in (Gajane, Urvoy,
and Kaufmann 2018; Ren et al. 2020), the authors investi-
gate MAB with LDP guarantee. (Zheng et al. 2020) studies
LDP in the linear (contextual) bandit setting. However, LDP
in the most general scenario, i.e., Bayesian optimization
(BO), remains an important open problem.
Motivated by this, in this paper, we investigate the lo-
cally differentially private BO, in which the rewards are fur-
ther corrupted to protect privacy. Specifically, we consider
a Gaussian process (GP) model for BO (also called Gaus-
sian process bandit setting), which directly generalizes both
MAB and linear bandit setting. The main contributions of
this paper can be summarized as follows.
Contributions. (i) We first derive the regret lower bounds
for any LDP mechanism and any learning algorithm. (ii)
Then, we present three almost optimal algorithms based on
the GP-UCB framework and Laplace DP mechanism. (iii)
Our two new methods developed for handling LDP also con-
tribute to BO under heavy-tailed payoffs in general. In par-
ticular, one is a new truncation method that can be applied
to any sub-Weibull rewards. The other one, called MoMA-
GP-UCB, is based on median-of-means techniques and ker-
nel approximations, which complements previous BO al-
gorithms for general heavy-tailed payoffs (Chowdhury and
The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)
11152
评论