Gaussian Mechanism Basics¶
The Gaussian Mechanism adds noise drawn from a Gaussian (normal) distribution to realize \((\epsilon, \delta)\) differential privacy.
This mechanism has better performance for vector-valued queries than the Laplace Mechanism (queries that return many data points per individual at once).
This notebook walks through the basic eeprivacy
functions for working with the Gaussian Mechanism.
[1]:
# Preamble: imports and figure settings
from eeprivacy import (
GaussianMechanism,
)
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import matplotlib as mpl
from scipy import stats
np.random.seed(1234) # Fix seed for deterministic documentation
mpl.style.use("seaborn-white")
MD = 28
LG = 36
plt.rcParams.update({
"figure.figsize": [25, 10],
"legend.fontsize": MD,
"axes.labelsize": LG,
"axes.titlesize": LG,
"xtick.labelsize": LG,
"ytick.labelsize": LG,
})
Distribution of Gaussian Mechanism Outputs¶
For a given ε, noise is drawn from the normal distribution at \(\sigma^2 = \frac{2s^2 \log(1.25/\delta)}{\epsilon^2}\). The eeprivacy
function gaussian_mechanism
draws this noise and adds it to a private value:
[2]:
trials = []
for t in range(1000):
trials.append(GaussianMechanism.execute(
value=0,
epsilon=0.1,
delta=1e-12,
sensitivity=1
))
plt.hist(trials, bins=30, color="k")
plt.title("Distribution of outputs from Gaussian Mechanism")
plt.show()
Gaussian Mechanism Confidence Interval¶
With the eeprivacy
confidence interval functions, analysts can determine how far away the true value of a statistics is from the differentially private result.
To determine the confidence interval for a given choice of privacy parameters, employ eeprivacy.gaussian_mechanism_confidence_interval
.
To determine the privacy parameters for a desired confidence interval, employ eeprivacy.gaussian_mechanism_epsilon_for_confidence_interval
.
The confidence intervals reported below are two-sided. For example, for a 95% confidence interval of +/-10, 2.5% of results will be smaller than -10 and 2.5% of results will be larger than +10.
[3]:
trials = []
for t in range(100000):
trials.append(GaussianMechanism.execute(
value=0,
epsilon=0.1,
delta=1e-12,
sensitivity=1
))
plt.hist(trials, bins=30, color="k")
plt.title("Distribution of outputs from Gaussian Mechanism")
plt.show()
ci = np.quantile(trials, 0.975)
print(f"95% Confidence Interval (Stochastic): {ci}")
ci = GaussianMechanism.confidence_interval(
epsilon=0.1,
delta=1e-12,
sensitivity=1,
confidence=0.95
)
print(f"95% Confidence Interval (Exact): {ci}")
# Now in reverse:
epsilon = GaussianMechanism.epsilon_for_confidence_interval(
target_ci=146.288,
delta=1e-12,
sensitivity=1,
confidence=0.95
)
print(f"ε for confidence interval: {epsilon}")
95% Confidence Interval (Stochastic): 146.00710633910379
95% Confidence Interval (Exact): 146.28781668617955
ε for confidence interval: 0.09999987468977604