AAAI 2021-Local Differential Privacy for Bayesian Optimization.pdf

波风水门

191

8页

6次

2023-05-31

免费下载

Local Differential Privacy for Bayesian Optimization

Xingyu Zhou,

1∗

Jian Tan

ECE, The Ohio State University

Alibaba Group, Sunnyvale

zhou.2055@osu.edu, j.tan@alibaba-inc.com

Abstract

Motivated by the increasing concern about privacy in nowa-

days data-intensive online learning systems, we consider a

black-box optimization in the nonparametric Gaussian pro-

cess setting with local differential privacy (LDP) guarantee.

Speciﬁcally, the rewards from each user are further corrupted

to protect privacy and the learner only has access to the cor-

rupted rewards to minimize the regret. We ﬁrst derive the re-

gret lower bounds for any LDP mechanism and any learn-

ing algorithm. Then, we present three almost optimal algo-

rithms based on the GP-UCB framework and Laplace DP

mechanism. In this process, we also propose a new Bayesian

optimization (BO) method (called MoMA-GP-UCB) based

on median-of-means techniques and kernel approximations,

which complements previous BO algorithms for heavy-tailed

payoffs with a reduced complexity. Further, empirical com-

parisons of different algorithms on both synthetic and real-

world datasets highlight the superior performance of MoMA-

GP-UCB in both private and non-private scenarios.

Introduction

We consider the problem of maximizing an unknown func-

tion f over a set D via sequentially querying it and received

only bandit feedback, i.e., when we query at x, we observe

a possibly noisy evaluation of f (x). This model has been

a main focus in machine learning research, e.g., the classic

multi-armed bandit (MAB) setting (Lai and Robbins 1985),

linear bandit setting (Abbasi-Yadkori, P

al, and Szepesv

ari

2011) and the general Bayesian optimization (Shahriari et al.

2015), with each one generalizing the previous one. It also

ﬁnds broad applications in many real-world systems, includ-

ing medical experiments, online shopping websites and rec-

ommender systems (Li et al. 2010). These systems adap-

tively make a decision and receive rewards (feedback) from

the user to simultaneously learn insightful facts and maxi-

mize the proﬁt.

Recently, privacy has become a key issue in the above

mentioned online learning systems. Users have become in-

creasingly concerned about directly sharing their online in-

formation or activities to these systems, since these activities

∗

This work was done during the internship of the ﬁrst author at

Alibaba Group, Sunnyvale, USA.

 2021, Association for the Advancement of Artiﬁcial

may reveal their private information. For example, a cus-

tomer of an online shopping website is not willing to tell the

website that he or she has purchased medicines for mental is-

sues. Another example is the medical experiments in which

the patient could reject to share the actual effects of the treat-

ment due to privacy concerns. This stimulates the need to

have a mechanism that further corrupts the feedbacks from

each user to protect privacy, which exactly ﬁts the locally

differential private (LDP) model (Kasiviswanathan et al.

2011; Duchi, Jordan, and Wainwright 2013).

In contrast to the standard differential privacy

model (Dwork, Roth et al. 2014), in which the learner

collects the true data while releasing a private output to

protect privacy, in the LDP model, the learner only has

access to corrupted input data from the users. Hence, LDP

often provides a much stronger privacy protection for the

user and is more appealing in real applications, especially

for the systems mentioned above (Cormode et al. 2018). To

the best of our knowledge, in the setting of online learning

with bandit feedback, LDP model has only been studied

theoretically very recently. For example, in (Gajane, Urvoy,

and Kaufmann 2018; Ren et al. 2020), the authors investi-

gate MAB with LDP guarantee. (Zheng et al. 2020) studies

LDP in the linear (contextual) bandit setting. However, LDP

in the most general scenario, i.e., Bayesian optimization

(BO), remains an important open problem.

Motivated by this, in this paper, we investigate the lo-

cally differentially private BO, in which the rewards are fur-

ther corrupted to protect privacy. Speciﬁcally, we consider

a Gaussian process (GP) model for BO (also called Gaus-

sian process bandit setting), which directly generalizes both

MAB and linear bandit setting. The main contributions of

this paper can be summarized as follows.

Contributions. (i) We ﬁrst derive the regret lower bounds

for any LDP mechanism and any learning algorithm. (ii)

Then, we present three almost optimal algorithms based on

the GP-UCB framework and Laplace DP mechanism. (iii)

Our two new methods developed for handling LDP also con-

tribute to BO under heavy-tailed payoffs in general. In par-

ticular, one is a new truncation method that can be applied

to any sub-Weibull rewards. The other one, called MoMA-

GP-UCB, is based on median-of-means techniques and ker-

nel approximations, which complements previous BO al-

gorithms for general heavy-tailed payoffs (Chowdhury and

The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

11152

Gopalan 2019) with a reduced complexity. (iv) We further

conduct empirical comparisons of different algorithms over

both synthetic and real-world datasets, which demonstrate

the superior performance of our new MoMA-GP-UCB al-

gorithm in both private and non-private settings.

Related Work

In the traditional non-private case, a line of BO meth-

ods based on Gaussian process (GP) and upper conﬁ-

dence bound (UCB) have been analyzed in both sub-

Gaussian (Srinivas et al. 2009; Chowdhury and Gopalan

2017) and heavy-tailed scenarios (Chowdhury and Gopalan

2019). Kernel approximation is recently proposed to reduce

complexity of GP-based BO algorithms (Mutny and Krause

2018; Calandriello et al. 2019). In the private BO case, (Kus-

ner et al. 2015) studies how to privately release the BO out-

puts to protect privacy (e.g., hyper-parameters of machine

learning model), and hence it belongs to the traditional DP

perspective rather than LDP.

LDP model has been previously considered in MAB set-

ting (Gajane, Urvoy, and Kaufmann 2018; Basu, Dimi-

trakakis, and Tossou 2019; Ren et al. 2020). Recently, it is

generalized to linear contextual bandits in which both the

rewards and the contexts are corrupted for privacy (Zheng

et al. 2020). There are also some other types of DP con-

sidered in MAB and linear bandit setting (not comparable to

our work). Due to space limitations, we refer readers to (Ren

et al. 2020; Zheng et al. 2020) and the references therein.

Problem Statement and Preliminaries

We consider a sequential decision-making problem over a

set D. A learning policy is adopted to select an action

∈ D at each discrete time slot t = 1, 2, . . . with the

corresponding reward observation y

= f (x

) + η

, i.e.,

could be a noisy version of f (x

). Then, the reward y

will be further corrupted to protect privacy, and only the

private response ¯y

is revealed to the learning agent. The

action x

is chosen based on the arms played and the pri-

vate rewards obtained until t − 1, denoted by the history

t−1

= {(x

, ¯y

) : s ∈ [t − 1]

}. The objective is to simul-

taneously preserve LDP and minimize the cumulative regret

deﬁned as

t=1

f(x

∗

) − f(x

), (1)

where x

∗

= arg max

x∈D

f(x) (assuming the maximum is

attained).

Deﬁnition 1 ((, δ)-LDP). A randomized mechanism M :

D → Z is said to protect (, δ)-LDP if for any x, x

∈ D,

and any measurable subset E ∈ Z, there is

P{M(x) ∈ E} ≤ e



P{M(x

) ∈ E} + δ,

for  ≥ 0 and δ ≥ 0. Moreover, if δ = 0, we say it protects

-LDP.

For any positive integer, we deﬁne [m]:={1,2, . . . , m}

Note that, if not explicitly stated, LDP in this paper means

-LDP (stronger than (, δ)-LDP).

Noise Assumptions. We assume that the noise η

has zero

mean conditioned on the history and is bounded by R almost

surely. We also address the case of unbounded noise at the

end of the paper.

Regularity Assumptions. Attaining a sub-linear regret is

in general infeasible for an arbitrary reward function f over

a very large space without any assumptions on the structure

of f. In this paper, we assume that D is compact and f has

a bounded norm in the RKHS of functions D → R, corre-

sponding a kernel function k : D × D → R. This RKHS

denoted by H

(D) is completely determined by its kernel

function with an inner product h·, ·i

that satisﬁes the repro-

ducing property: f(x) = hf, k(x, ·)i

for all f ∈ H

(D).

The norm for the RKHS is given by kfk

hf, fi

which measures the smoothness of f. We assume kfk

≤

B and B < ∞ is a known constant. Moreover, we assume a

bounded variance by restricting k(x, x) ≤ 1. Note that two

commonly used kernels Squared Exponential and Mat

ern

satisfy the bounded variance assumption, deﬁned as:

(x, x

) = exp (−s

/2l

)

Mat

ern

(x, x

) =

1−ν

Γ(ν)

√

2ν

√

2ν

where l > 0 and ν > 0 are hyper-parameters, s = kx − x

speciﬁes the similarity between two points, and B

(·) is the

modiﬁed Bessel function.

Surrogate GP Model

. A Gaussian process, denoted

by GP(µ(·), k(·, ·)), is a collection of (possibly inﬁnitely

many) random variables f (x), x ∈ D, such that every ﬁ-

nite subset of random variables {f (x

), i ∈ [m]} is jointly

Gaussian with mean E [f(x

)] = µ(x

) and covariance

E [(f(x

) − µ(x

))(f(x

) − µ(x

))] = k(x

, x

), where

i, j ∈ [m] and m ∈ N. By conditioning GPs on avail-

able observations, one can obtain a non-parametric surro-

gate Bayesian model over the space of functions. In par-

ticular, we use GP(0, k(·, ·)) as an initial prior on the un-

known black-box function f, and a Gaussian likelihood with

the noise variables η

drawn independently across t from

N(0, λ). Conditioned on a set of past observations H

{(x

, y

), s ∈ [t]}, by the properties of GPs (Rasmussen

2003), the posterior distribution over f is GP(µ

(·), k

(·, ·)),

where

(x) = k

(x)

+ λI)

−1

1:t

(2)

(x, x

) = k(x, x

) − k

(x)

+ λI)

−1

)

(x) = k

(x, x), (3)

and k

(x) = [k(x

, x), . . . , k(x

, x)]

and K

[k(u, v)]

u,v∈H

. Therefore, for every x ∈ D, the posterior

distribution of f(x), given H

is N(µ

(x), σ

(x)). The fol-

lowing term often plays a key role in the regret bounds of

The surrogate GP model described above (i.e., a GP prior and

a Gaussian likelihood) is only used for the algorithm design.

11153

Local Differential Privacy for Bayesian Optimization

Xingyu Zhou,

1∗

Jian Tan

ECE, The Ohio State University

Alibaba Group, Sunnyvale

zhou.2055@osu.edu, j.tan@alibaba-inc.com

Abstract

Motivated by the increasing concern about privacy in nowa-

days data-intensive online learning systems, we consider a

black-box optimization in the nonparametric Gaussian pro-

cess setting with local differential privacy (LDP) guarantee.

Speciﬁcally, the rewards from each user are further corrupted

to protect privacy and the learner only has access to the cor-

rupted rewards to minimize the regret. We ﬁrst derive the re-

gret lower bounds for any LDP mechanism and any learn-

ing algorithm. Then, we present three almost optimal algo-

rithms based on the GP-UCB framework and Laplace DP

mechanism. In this process, we also propose a new Bayesian

optimization (BO) method (called MoMA-GP-UCB) based

on median-of-means techniques and kernel approximations,

which complements previous BO algorithms for heavy-tailed

payoffs with a reduced complexity. Further, empirical com-

parisons of different algorithms on both synthetic and real-

world datasets highlight the superior performance of MoMA-

GP-UCB in both private and non-private scenarios.

Introduction

We consider the problem of maximizing an unknown func-

tion f over a set D via sequentially querying it and received

only bandit feedback, i.e., when we query at x, we observe

a possibly noisy evaluation of f (x). This model has been

a main focus in machine learning research, e.g., the classic

multi-armed bandit (MAB) setting (Lai and Robbins 1985),

linear bandit setting (Abbasi-Yadkori, P

al, and Szepesv

ari

2011) and the general Bayesian optimization (Shahriari et al.

2015), with each one generalizing the previous one. It also

ﬁnds broad applications in many real-world systems, includ-

ing medical experiments, online shopping websites and rec-

ommender systems (Li et al. 2010). These systems adap-

tively make a decision and receive rewards (feedback) from

the user to simultaneously learn insightful facts and maxi-

mize the proﬁt.

Recently, privacy has become a key issue in the above

mentioned online learning systems. Users have become in-

creasingly concerned about directly sharing their online in-

formation or activities to these systems, since these activities

∗

This work was done during the internship of the ﬁrst author at

Alibaba Group, Sunnyvale, USA.

 2021, Association for the Advancement of Artiﬁcial

may reveal their private information. For example, a cus-

tomer of an online shopping website is not willing to tell the

website that he or she has purchased medicines for mental is-

sues. Another example is the medical experiments in which

the patient could reject to share the actual effects of the treat-

ment due to privacy concerns. This stimulates the need to

have a mechanism that further corrupts the feedbacks from

each user to protect privacy, which exactly ﬁts the locally

differential private (LDP) model (Kasiviswanathan et al.

2011; Duchi, Jordan, and Wainwright 2013).

In contrast to the standard differential privacy

model (Dwork, Roth et al. 2014), in which the learner

collects the true data while releasing a private output to

protect privacy, in the LDP model, the learner only has

access to corrupted input data from the users. Hence, LDP

often provides a much stronger privacy protection for the

user and is more appealing in real applications, especially

for the systems mentioned above (Cormode et al. 2018). To

the best of our knowledge, in the setting of online learning

with bandit feedback, LDP model has only been studied

theoretically very recently. For example, in (Gajane, Urvoy,

and Kaufmann 2018; Ren et al. 2020), the authors investi-

gate MAB with LDP guarantee. (Zheng et al. 2020) studies

LDP in the linear (contextual) bandit setting. However, LDP

in the most general scenario, i.e., Bayesian optimization

(BO), remains an important open problem.

Motivated by this, in this paper, we investigate the lo-

cally differentially private BO, in which the rewards are fur-

ther corrupted to protect privacy. Speciﬁcally, we consider

a Gaussian process (GP) model for BO (also called Gaus-

sian process bandit setting), which directly generalizes both

MAB and linear bandit setting. The main contributions of

this paper can be summarized as follows.

Contributions. (i) We ﬁrst derive the regret lower bounds

for any LDP mechanism and any learning algorithm. (ii)

Then, we present three almost optimal algorithms based on

the GP-UCB framework and Laplace DP mechanism. (iii)

Our two new methods developed for handling LDP also con-

tribute to BO under heavy-tailed payoffs in general. In par-

ticular, one is a new truncation method that can be applied

to any sub-Weibull rewards. The other one, called MoMA-

GP-UCB, is based on median-of-means techniques and ker-

nel approximations, which complements previous BO al-

gorithms for general heavy-tailed payoffs (Chowdhury and

The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

11152

Gopalan 2019) with a reduced complexity. (iv) We further

conduct empirical comparisons of different algorithms over

both synthetic and real-world datasets, which demonstrate

the superior performance of our new MoMA-GP-UCB al-

gorithm in both private and non-private settings.

Related Work

In the traditional non-private case, a line of BO meth-

ods based on Gaussian process (GP) and upper conﬁ-

dence bound (UCB) have been analyzed in both sub-

Gaussian (Srinivas et al. 2009; Chowdhury and Gopalan

2017) and heavy-tailed scenarios (Chowdhury and Gopalan

2019). Kernel approximation is recently proposed to reduce

complexity of GP-based BO algorithms (Mutny and Krause

2018; Calandriello et al. 2019). In the private BO case, (Kus-

ner et al. 2015) studies how to privately release the BO out-

puts to protect privacy (e.g., hyper-parameters of machine

learning model), and hence it belongs to the traditional DP

perspective rather than LDP.

LDP model has been previously considered in MAB set-

ting (Gajane, Urvoy, and Kaufmann 2018; Basu, Dimi-

trakakis, and Tossou 2019; Ren et al. 2020). Recently, it is

generalized to linear contextual bandits in which both the

rewards and the contexts are corrupted for privacy (Zheng

et al. 2020). There are also some other types of DP con-

sidered in MAB and linear bandit setting (not comparable to

our work). Due to space limitations, we refer readers to (Ren

et al. 2020; Zheng et al. 2020) and the references therein.

Problem Statement and Preliminaries

We consider a sequential decision-making problem over a

set D. A learning policy is adopted to select an action

∈ D at each discrete time slot t = 1, 2, . . . with the

corresponding reward observation y

= f (x

) + η

, i.e.,

could be a noisy version of f (x

). Then, the reward y

will be further corrupted to protect privacy, and only the

private response ¯y

is revealed to the learning agent. The

action x

is chosen based on the arms played and the pri-

vate rewards obtained until t − 1, denoted by the history

t−1

= {(x

, ¯y

) : s ∈ [t − 1]

}. The objective is to simul-

taneously preserve LDP and minimize the cumulative regret

deﬁned as

t=1

f(x

∗

) − f(x

), (1)

where x

∗

= arg max

x∈D

f(x) (assuming the maximum is

attained).

Deﬁnition 1 ((, δ)-LDP). A randomized mechanism M :

D → Z is said to protect (, δ)-LDP if for any x, x

∈ D,

and any measurable subset E ∈ Z, there is

P{M(x) ∈ E} ≤ e



P{M(x

) ∈ E} + δ,

for  ≥ 0 and δ ≥ 0. Moreover, if δ = 0, we say it protects

-LDP.

For any positive integer, we deﬁne [m]:={1,2, . . . , m}

Note that, if not explicitly stated, LDP in this paper means

-LDP (stronger than (, δ)-LDP).

Noise Assumptions. We assume that the noise η

has zero

mean conditioned on the history and is bounded by R almost

surely. We also address the case of unbounded noise at the

end of the paper.

Regularity Assumptions. Attaining a sub-linear regret is

in general infeasible for an arbitrary reward function f over

a very large space without any assumptions on the structure

of f. In this paper, we assume that D is compact and f has

a bounded norm in the RKHS of functions D → R, corre-

sponding a kernel function k : D × D → R. This RKHS

denoted by H

(D) is completely determined by its kernel

function with an inner product h·, ·i

that satisﬁes the repro-

ducing property: f(x) = hf, k(x, ·)i

for all f ∈ H

(D).

The norm for the RKHS is given by kfk

hf, fi

which measures the smoothness of f. We assume kfk

≤

B and B < ∞ is a known constant. Moreover, we assume a

bounded variance by restricting k(x, x) ≤ 1. Note that two

commonly used kernels Squared Exponential and Mat

ern

satisfy the bounded variance assumption, deﬁned as:

(x, x

) = exp (−s

/2l

)

Mat

ern

(x, x

) =

1−ν

Γ(ν)

√

2ν

√

2ν

where l > 0 and ν > 0 are hyper-parameters, s = kx − x

speciﬁes the similarity between two points, and B

(·) is the

modiﬁed Bessel function.

Surrogate GP Model

. A Gaussian process, denoted

by GP(µ(·), k(·, ·)), is a collection of (possibly inﬁnitely

many) random variables f (x), x ∈ D, such that every ﬁ-

nite subset of random variables {f (x

), i ∈ [m]} is jointly

Gaussian with mean E [f(x

)] = µ(x

) and covariance

E [(f(x

) − µ(x

))(f(x

) − µ(x

))] = k(x

, x

), where

i, j ∈ [m] and m ∈ N. By conditioning GPs on avail-

able observations, one can obtain a non-parametric surro-

gate Bayesian model over the space of functions. In par-

ticular, we use GP(0, k(·, ·)) as an initial prior on the un-

known black-box function f, and a Gaussian likelihood with

the noise variables η

drawn independently across t from

N(0, λ). Conditioned on a set of past observations H

{(x

, y

), s ∈ [t]}, by the properties of GPs (Rasmussen

2003), the posterior distribution over f is GP(µ

(·), k

(·, ·)),

where

(x) = k

(x)

+ λI)

−1

1:t

(2)

(x, x

) = k(x, x

) − k

(x)

+ λI)

−1

)

(x) = k

(x, x), (3)

and k

(x) = [k(x

, x), . . . , k(x

, x)]

and K

[k(u, v)]

u,v∈H

. Therefore, for every x ∈ D, the posterior

distribution of f(x), given H

is N(µ

(x), σ

(x)). The fol-

lowing term often plays a key role in the regret bounds of

The surrogate GP model described above (i.e., a GP prior and

a Gaussian likelihood) is only used for the algorithm design.

11153

of 8

免费下载

关注

评论