Stochastically Testing Privacy Mechanisms

How do you validate that a differential privacy implementation actually works?

One approach that can build confidence that the differential privacy property holds for an implementation is stochastic testing: run many iterations of the algorithm against neighboring databases and check that for any output, the expected probability is bounded by ϵ.

[1]:
# Preamble: imports and figure settings

from eeprivacy import PrivateClampedMean

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import matplotlib as mpl
from scipy import stats

np.random.seed(1234) # Fix seed for deterministic documentation

mpl.style.use("seaborn-white")

MD = 20
LG = 24
plt.rcParams.update({
    "figure.figsize": [25, 7],
    "legend.fontsize": MD,
    "axes.labelsize": LG,
    "axes.titlesize": LG,
    "xtick.labelsize": LG,
    "ytick.labelsize": LG,
})

In the test below, we run a PrivateClampedMean for a large number of trials for two different databases: one with a single element 0 and one with a single element 1.

Then, we bin the results and compute the “realized ϵ” for each bin. By chance, sometimes this will slightly exceed the ϵ value. The test fails if the realized ϵ greatly exceeds the desired ϵ for any of the bins.

[4]:
private_mean = PrivateClampedMean(lower_bound=0, upper_bound=1)

T = 1000000
A = [private_mean.execute(values=[], epsilon=0.1) for t in range(T)]
B = [private_mean.execute(values=[1], epsilon=0.1) for t in range(T)]

L = 0
U = 1

A = np.clip(A, L, U)
B = np.clip(B, L, U)

bins = np.linspace(L, U, num=50)

fig, ax = plt.subplots()
ax.set_yscale("log")

plt.hist(A, color='b', alpha=0.5, bins=bins)
plt.hist(B, color='r', alpha=0.5, bins=bins)
plt.title("Compare output likelihood for neighboring databases")
plt.xlabel("Output")
plt.ylabel("Count (log scale)")
plt.show()

A, bin_edges = np.histogram(A, bins=bins)
B, bin_edges = np.histogram(B, bins=bins)

realized_epsilon = np.abs(np.log(A / B))

plt.hist(realized_epsilon, color="k", bins=20)
plt.title("Realized ε")
plt.xlabel("ε")
plt.ylabel("Count")
plt.show()
_images/stochastic-testing_3_0.png
_images/stochastic-testing_3_1.png