We introduce a refined differentially private (DP) data structure for kernel
density estimation (KDE), offering not only improved privacy-utility tradeoff
but also better efficiency over prior results. Specifically, we study the
mathematical problem: given a similarity function
f (or DP KDE) and a private
dataset
X⊂Rd, our goal is to preprocess
X so that for any
query
y∈Rd, we approximate
∑x∈Xf(x,y) in a
differentially private fashion. The best previous algorithm for $f(x,y) =\| x -
y \|_1$ is the node-contaminated balanced binary tree by [Backurs, Lin,
Mahabadi, Silwal, and Tarnawski, ICLR 2024]. Their algorithm requires
O(nd)
space and time for preprocessing with
n=∣X∣. For any query point, the query
time is
dlogn, with an error guarantee of
(1+α)-approximation and
ϵ−1α−0.5d1.5Rlog1.5n.
In this paper, we improve the best previous result [Backurs, Lin, Mahabadi,
Silwal, and Tarnawski, ICLR 2024] in three aspects:
- We reduce query time by a factor of
α−1logn.
- We improve the approximation ratio from
α to 1.
- We reduce the error dependence by a factor of
α−0.5.
From a technical perspective, our method of constructing the search tree
differs from previous work [Backurs, Lin, Mahabadi, Silwal, and Tarnawski, ICLR
2024]. In prior work, for each query, the answer is split into $\alpha^{-1}
\log n
numbers,eachderivedfromthesummationof\log n$ values in interval
tree countings. In contrast, we construct the tree differently, splitting the
answer into
logn numbers, where each is a smart combination of two distance
values, two counting values, and
y itself. We believe our tree structure may
be of independent interest.