暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
AAAI 2021-Local Differential Privacy for Bayesian Optimization.pdf
191
8页
6次
2023-05-31
免费下载
Local Differential Privacy for Bayesian Optimization
Xingyu Zhou,
1
Jian Tan
2
1
ECE, The Ohio State University
2
Alibaba Group, Sunnyvale
zhou.2055@osu.edu, j.tan@alibaba-inc.com
Abstract
Motivated by the increasing concern about privacy in nowa-
days data-intensive online learning systems, we consider a
black-box optimization in the nonparametric Gaussian pro-
cess setting with local differential privacy (LDP) guarantee.
Specifically, the rewards from each user are further corrupted
to protect privacy and the learner only has access to the cor-
rupted rewards to minimize the regret. We first derive the re-
gret lower bounds for any LDP mechanism and any learn-
ing algorithm. Then, we present three almost optimal algo-
rithms based on the GP-UCB framework and Laplace DP
mechanism. In this process, we also propose a new Bayesian
optimization (BO) method (called MoMA-GP-UCB) based
on median-of-means techniques and kernel approximations,
which complements previous BO algorithms for heavy-tailed
payoffs with a reduced complexity. Further, empirical com-
parisons of different algorithms on both synthetic and real-
world datasets highlight the superior performance of MoMA-
GP-UCB in both private and non-private scenarios.
Introduction
We consider the problem of maximizing an unknown func-
tion f over a set D via sequentially querying it and received
only bandit feedback, i.e., when we query at x, we observe
a possibly noisy evaluation of f (x). This model has been
a main focus in machine learning research, e.g., the classic
multi-armed bandit (MAB) setting (Lai and Robbins 1985),
linear bandit setting (Abbasi-Yadkori, P
´
al, and Szepesv
´
ari
2011) and the general Bayesian optimization (Shahriari et al.
2015), with each one generalizing the previous one. It also
finds broad applications in many real-world systems, includ-
ing medical experiments, online shopping websites and rec-
ommender systems (Li et al. 2010). These systems adap-
tively make a decision and receive rewards (feedback) from
the user to simultaneously learn insightful facts and maxi-
mize the profit.
Recently, privacy has become a key issue in the above
mentioned online learning systems. Users have become in-
creasingly concerned about directly sharing their online in-
formation or activities to these systems, since these activities
This work was done during the internship of the first author at
Alibaba Group, Sunnyvale, USA.
Copyright
c
2021, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
may reveal their private information. For example, a cus-
tomer of an online shopping website is not willing to tell the
website that he or she has purchased medicines for mental is-
sues. Another example is the medical experiments in which
the patient could reject to share the actual effects of the treat-
ment due to privacy concerns. This stimulates the need to
have a mechanism that further corrupts the feedbacks from
each user to protect privacy, which exactly fits the locally
differential private (LDP) model (Kasiviswanathan et al.
2011; Duchi, Jordan, and Wainwright 2013).
In contrast to the standard differential privacy
model (Dwork, Roth et al. 2014), in which the learner
collects the true data while releasing a private output to
protect privacy, in the LDP model, the learner only has
access to corrupted input data from the users. Hence, LDP
often provides a much stronger privacy protection for the
user and is more appealing in real applications, especially
for the systems mentioned above (Cormode et al. 2018). To
the best of our knowledge, in the setting of online learning
with bandit feedback, LDP model has only been studied
theoretically very recently. For example, in (Gajane, Urvoy,
and Kaufmann 2018; Ren et al. 2020), the authors investi-
gate MAB with LDP guarantee. (Zheng et al. 2020) studies
LDP in the linear (contextual) bandit setting. However, LDP
in the most general scenario, i.e., Bayesian optimization
(BO), remains an important open problem.
Motivated by this, in this paper, we investigate the lo-
cally differentially private BO, in which the rewards are fur-
ther corrupted to protect privacy. Specifically, we consider
a Gaussian process (GP) model for BO (also called Gaus-
sian process bandit setting), which directly generalizes both
MAB and linear bandit setting. The main contributions of
this paper can be summarized as follows.
Contributions. (i) We first derive the regret lower bounds
for any LDP mechanism and any learning algorithm. (ii)
Then, we present three almost optimal algorithms based on
the GP-UCB framework and Laplace DP mechanism. (iii)
Our two new methods developed for handling LDP also con-
tribute to BO under heavy-tailed payoffs in general. In par-
ticular, one is a new truncation method that can be applied
to any sub-Weibull rewards. The other one, called MoMA-
GP-UCB, is based on median-of-means techniques and ker-
nel approximations, which complements previous BO al-
gorithms for general heavy-tailed payoffs (Chowdhury and
The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)
11152
Gopalan 2019) with a reduced complexity. (iv) We further
conduct empirical comparisons of different algorithms over
both synthetic and real-world datasets, which demonstrate
the superior performance of our new MoMA-GP-UCB al-
gorithm in both private and non-private settings.
Related Work
In the traditional non-private case, a line of BO meth-
ods based on Gaussian process (GP) and upper confi-
dence bound (UCB) have been analyzed in both sub-
Gaussian (Srinivas et al. 2009; Chowdhury and Gopalan
2017) and heavy-tailed scenarios (Chowdhury and Gopalan
2019). Kernel approximation is recently proposed to reduce
complexity of GP-based BO algorithms (Mutny and Krause
2018; Calandriello et al. 2019). In the private BO case, (Kus-
ner et al. 2015) studies how to privately release the BO out-
puts to protect privacy (e.g., hyper-parameters of machine
learning model), and hence it belongs to the traditional DP
perspective rather than LDP.
LDP model has been previously considered in MAB set-
ting (Gajane, Urvoy, and Kaufmann 2018; Basu, Dimi-
trakakis, and Tossou 2019; Ren et al. 2020). Recently, it is
generalized to linear contextual bandits in which both the
rewards and the contexts are corrupted for privacy (Zheng
et al. 2020). There are also some other types of DP con-
sidered in MAB and linear bandit setting (not comparable to
our work). Due to space limitations, we refer readers to (Ren
et al. 2020; Zheng et al. 2020) and the references therein.
Problem Statement and Preliminaries
We consider a sequential decision-making problem over a
set D. A learning policy is adopted to select an action
x
t
D at each discrete time slot t = 1, 2, . . . with the
corresponding reward observation y
t
= f (x
t
) + η
t
, i.e.,
y
t
could be a noisy version of f (x
t
). Then, the reward y
t
will be further corrupted to protect privacy, and only the
private response ¯y
t
is revealed to the learning agent. The
action x
t
is chosen based on the arms played and the pri-
vate rewards obtained until t 1, denoted by the history
H
t1
= {(x
s
, ¯y
s
) : s [t 1]
1
}. The objective is to simul-
taneously preserve LDP and minimize the cumulative regret
defined as
R
T
=
T
X
t=1
f(x
) f(x
t
), (1)
where x
= arg max
x∈D
f(x) (assuming the maximum is
attained).
Definition 1 ((, δ)-LDP). A randomized mechanism M :
D Z is said to protect (, δ)-LDP if for any x, x
0
D,
and any measurable subset E Z, there is
P{M(x) E} e
P{M(x
0
) E} + δ,
for 0 and δ 0. Moreover, if δ = 0, we say it protects
-LDP.
1
For any positive integer, we define [m]:={1,2, . . . , m}
Note that, if not explicitly stated, LDP in this paper means
-LDP (stronger than (, δ)-LDP).
Noise Assumptions. We assume that the noise η
t
has zero
mean conditioned on the history and is bounded by R almost
surely. We also address the case of unbounded noise at the
end of the paper.
Regularity Assumptions. Attaining a sub-linear regret is
in general infeasible for an arbitrary reward function f over
a very large space without any assumptions on the structure
of f. In this paper, we assume that D is compact and f has
a bounded norm in the RKHS of functions D R, corre-
sponding a kernel function k : D × D R. This RKHS
denoted by H
k
(D) is completely determined by its kernel
function with an inner product , ·i
H
that satisfies the repro-
ducing property: f(x) = hf, k(x, ·)i
H
for all f H
k
(D).
The norm for the RKHS is given by kfk
H
=
p
hf, fi
H
,
which measures the smoothness of f. We assume kfk
H
B and B < is a known constant. Moreover, we assume a
bounded variance by restricting k(x, x) 1. Note that two
commonly used kernels Squared Exponential and Mat
´
ern
satisfy the bounded variance assumption, defined as:
k
SE
(x, x
0
) = exp (s
2
/2l
2
)
k
Mat
´
ern
(x, x
0
) =
2
1ν
Γ(ν)
s
2ν
l
!
ν
B
ν
s
2ν
l
!
,
where l > 0 and ν > 0 are hyper-parameters, s = kx x
0
k
2
specifies the similarity between two points, and B
ν
(·) is the
modified Bessel function.
Surrogate GP Model
2
. A Gaussian process, denoted
by GP(µ(·), k(·, ·)), is a collection of (possibly infinitely
many) random variables f (x), x D, such that every fi-
nite subset of random variables {f (x
i
), i [m]} is jointly
Gaussian with mean E [f(x
i
)] = µ(x
i
) and covariance
E [(f(x
i
) µ(x
i
))(f(x
j
) µ(x
j
))] = k(x
i
, x
j
), where
i, j [m] and m N. By conditioning GPs on avail-
able observations, one can obtain a non-parametric surro-
gate Bayesian model over the space of functions. In par-
ticular, we use GP(0, k(·, ·)) as an initial prior on the un-
known black-box function f, and a Gaussian likelihood with
the noise variables η
t
drawn independently across t from
N(0, λ). Conditioned on a set of past observations H
t
=
{(x
s
, y
s
), s [t]}, by the properties of GPs (Rasmussen
2003), the posterior distribution over f is GP(µ
t
(·), k
t
(·, ·)),
where
µ
t
(x) = k
t
(x)
T
(K
t
+ λI)
1
y
1:t
(2)
k
t
(x, x
0
) = k(x, x
0
) k
t
(x)
T
(K
t
+ λI)
1
k
t
(x
0
)
σ
2
t
(x) = k
t
(x, x), (3)
and k
t
(x) = [k(x
1
, x), . . . , k(x
t
, x)]
T
and K
t
=
[k(u, v)]
u,v∈H
t
. Therefore, for every x D, the posterior
distribution of f(x), given H
t
is N(µ
t
(x), σ
2
t
(x)). The fol-
lowing term often plays a key role in the regret bounds of
2
The surrogate GP model described above (i.e., a GP prior and
a Gaussian likelihood) is only used for the algorithm design.
11153
of 8
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜