Central vs Local Differential Privacy for GIS: Calibration, Validation, and Incident Response

This guide drills into one decision inside the broader privacy model comparison workflow: when a spatial pipeline reaches for differential privacy, should noise be injected centrally by a trusted curator or locally at the point of collection? It assumes the ingestion/computation separation defined in Core Fundamentals & Architecture for Spatial Privacy and consumes the per-feature risk weights produced by spatial sensitivity scoring models to set every parameter below. The choice is not stylistic — it fixes the trust boundary, the noise scale for a given ε, and the spatial utility that survives into routing, zoning, or epidemiological analysis.

The two models share the same formal definition but place the curator on opposite sides of the trust boundary. In the central model, raw coordinate streams, spatial event logs, and mobility traces are aggregated inside a secure enclave and noise is added once to the aggregate, so the Laplace scale is sensitivity / ε and a small ε still yields high-resolution output. In the local model the curator is untrusted: every record is perturbed on-device before transmission, so the same formal ε must absorb far more noise — the price of never trusting a single aggregation point. Local perturbation is mandatory when raw telemetry cannot leave the client, which is exactly the regime that federated learning workflows and secure multi-party computation in spatial analytics target.

Parameter Configuration & Calibration

Every knob below ties a vague “add some noise” instruction to a concrete, auditable value. Calibrate against the composite risk score from the parent privacy model comparison inventory rather than guessing, and remember that for disjoint spatial bins one person changes a count by at most 1, so the count-query L1 sensitivity is exactly 1.0 — the lever you actually turn is ε, not sensitivity.

Knob	Central model	Local model	Rationale
Privacy budget `ε` (per query)	`0.1`–`1.0`	`1.0`–`3.0`	Local noise scales with the full coordinate-domain diameter, so sub-`1.0` `ε` collapses utility; central aggregation tolerates a tighter budget.
`δ` (Gaussian only)	`≤ 1e-5`	`0` (use pure-`ε` Laplace)	Local DP keeps a pure-`ε` guarantee; a `δ > 0` term is hard to audit per-device.
Grid resolution	H3 `8`–`9`	H3 `6`–`7`	Coarser bins in the local model recover bin cardinality (and `k`) against the larger noise.
Coordinate sensitivity	`L1 = 1` per disjoint bin	domain diameter (clip first)	Unclipped coordinates make the guarantee vacuous; clip into a declared bounded domain before perturbing.
Drift tolerance (`p95`)	`< 100 m` (clinical catchment)	`< 500 m` (urban mobility)	Drift thresholds are workload-specific; exceed them and the release is noise, not data.

Two calibration rules matter more than the table. First, the two models are not interchangeable at equal ε: the central model injects a single Laplace draw into the aggregate, so a count query’s error stays constant at roughly sqrt(2) / ε regardless of how many records contribute, whereas the local model perturbs every record independently and the aggregate error grows like sqrt(n) / ε. Naively porting a central budget to the edge therefore destroys utility as n rises, which is why local deployments need both a larger ε and coarser bins. Second, allocate the global ε across hierarchical bins and track cumulative spend under sequential composition — Rényi Differential Privacy (RDP) accounting via the OpenDP primitives keeps ε exhaustion from silently degrading downstream convergence in iterative spatial joins. The k-anonymity floor that the noised grid must still satisfy is derived in how to calculate spatial k-anonymity thresholds.

Reference Implementation

The engine below exposes both mechanisms behind one budget tracker so the central/local asymmetry is explicit rather than hidden behind a single noise call. It uses type annotations, documents the privacy implication of each step inline, and is designed to drop into a batch or streaming GIS pipeline inside the enclave (central) or behind an on-device SDK (local).

python

import numpy as np
from dataclasses import dataclass
from typing import Dict


class BudgetExhaustedError(Exception):
    """Raised when a query would exceed the allocated epsilon budget."""


@dataclass
class PrivacyBudgetTracker:
    """Tracks cumulative epsilon/delta consumption across spatial queries.

    Sequential composition is assumed: every query that touches the same
    individuals adds its epsilon to the running total. Crossing the cap is a
    hard failure, never a silent clamp -- silent clamping is how pipelines
    leak their formal guarantee without anyone noticing.
    """
    epsilon_total: float
    delta: float = 1e-5
    epsilon_spent: float = 0.0

    def allocate(self, eps: float) -> float:
        if eps <= 0:
            raise ValueError("epsilon per query must be positive")
        if self.epsilon_spent + eps > self.epsilon_total + 1e-12:
            raise BudgetExhaustedError(
                f"Budget exhausted: spent {self.epsilon_spent:.4f}, "
                f"requested {eps:.4f}, cap {self.epsilon_total:.4f}"
            )
        self.epsilon_spent += eps
        return self.epsilon_total - self.epsilon_spent


class SpatialDPEngine:
    """Central- and local-model differential privacy for geospatial workloads.

    The same epsilon buys very different protection in the two models. In the
    central model noise is added once to a trusted aggregate, so the Laplace
    scale is sensitivity / epsilon. In the local model every record is perturbed
    independently before it leaves the device, so the scale is bounded by the
    whole coordinate-domain diameter and is far larger for the same epsilon.
    """

    def __init__(self, budget: PrivacyBudgetTracker, grid_resolution: int = 8,
                 domain_diameter_deg: float = 0.5) -> None:
        # grid_resolution: H3 resolution used to bin coordinates (6-9 typical).
        # domain_diameter_deg: L1 extent of the bounded coordinate domain; this
        # IS the local-DP coordinate sensitivity, so it must be clipped to, not
        # guessed at -- an unbounded domain makes the guarantee meaningless.
        self.budget = budget
        self.grid_res = grid_resolution
        self.domain_diameter = domain_diameter_deg

    @staticmethod
    def _count_sensitivity() -> float:
        # One person changes a disjoint spatial-bin count by at most 1 (L1 = 1).
        return 1.0

    def noise_scale(self, model: str, eps: float) -> float:
        """Expose the Laplace scale so callers can reason about utility first."""
        if model == "central":
            return self._count_sensitivity() / eps
        if model == "local":
            return self.domain_diameter / eps
        raise ValueError("model must be 'central' or 'local'")

    def central_dp_counts(self, bin_counts: np.ndarray, eps: float) -> np.ndarray:
        """Laplace mechanism on PRE-AGGREGATED bins (trusted-curator model)."""
        self.budget.allocate(eps)
        scale = self.noise_scale("central", eps)
        noisy = bin_counts + np.random.laplace(0.0, scale, size=bin_counts.shape)
        # Counts are non-negative; clamp AFTER noise. Post-processing a DP output
        # is free -- it consumes no additional budget.
        return np.maximum(0.0, noisy)

    def local_dp_coords(self, coords: np.ndarray, eps: float) -> np.ndarray:
        """Per-record Laplace perturbation at the edge (untrusted-curator model)."""
        self.budget.allocate(eps)
        # Clip into the declared domain BEFORE perturbing so sensitivity is truly
        # bounded; clipping is part of the mechanism, not optional hardening.
        clipped = np.clip(coords, -self.domain_diameter, self.domain_diameter)
        scale = self.noise_scale("local", eps)  # pure epsilon-LDP, no delta term
        return clipped + np.random.laplace(0.0, scale, size=coords.shape)

    def local_dp_count(self, membership: np.ndarray, eps: float) -> float:
        """Estimate a bin count from per-user randomized response (untrusted model).

        Each user reports a noisy membership bit and the aggregate is de-biased.
        Because noise is injected n times instead of once, this estimator's error
        grows like sqrt(n) / epsilon, versus the central model's constant
        sqrt(2) / epsilon -- that gap IS the formal price of an untrusted curator.
        """
        self.budget.allocate(eps)
        p = 1.0 / (1.0 + np.exp(-eps))  # truthful-response probability under RR
        truthful = np.random.random(membership.shape) < p
        reported = np.where(truthful, membership, 1 - membership)
        # Unbiased de-bias: E[reported] = p*x + (1-p)*(1-x).
        return float((reported.sum() - (1 - p) * membership.size) / (2 * p - 1))

    def displacement_report(self, original: np.ndarray,
                            perturbed: np.ndarray) -> Dict[str, float]:
        """Deterministic utility report: spatial drift in coordinate units."""
        original = np.atleast_2d(np.asarray(original, dtype=float))
        perturbed = np.atleast_2d(np.asarray(perturbed, dtype=float))
        drift = np.linalg.norm(original - perturbed, axis=1)
        return {
            "mean_drift": float(np.mean(drift)),
            "p95_drift": float(np.percentile(drift, 95)),
            "max_drift": float(np.max(drift)),
            "budget_remaining": float(
                self.budget.epsilon_total - self.budget.epsilon_spent),
        }

Validation Checkpoint

A mechanism is only trustworthy if its guarantees are asserted, not assumed. The harness below runs a Monte Carlo sweep to confirm the invariants that distinguish a correct spatial DP engine: noised counts never go negative, the budget tracker fails loudly on over-spend, and — the property the whole page turns on — for the same bin-count query the local model’s aggregate error exceeds the central model’s at equal ε.

python

def _run_validation() -> None:
    rng = np.random.default_rng(7)

    # 1. Same query, equal epsilon: local aggregate error must exceed central.
    #    Central injects one Laplace(1/eps) draw; local injects n randomized
    #    responses, so its error grows ~ sqrt(n)/eps -- the untrusted-curator tax.
    members = (rng.random(5000) < 0.3).astype(float)
    true_count = members.sum()
    eps = 1.0
    central_err = float(np.mean([
        abs((true_count + np.random.laplace(0.0, 1.0 / eps)) - true_count)
        for _ in range(200)]))
    edge = SpatialDPEngine(PrivacyBudgetTracker(epsilon_total=1e6))
    local_err = float(np.mean([
        abs(edge.local_dp_count(members, eps) - true_count) for _ in range(200)]))
    assert local_err > central_err, (local_err, central_err)

    # 2. Central counts stay non-negative and spend exactly the requested budget.
    engine = SpatialDPEngine(PrivacyBudgetTracker(epsilon_total=10.0),
                             domain_diameter_deg=0.5)
    counts = rng.integers(0, 50, size=64).astype(float)
    noisy = engine.central_dp_counts(counts, eps=0.5)
    assert (noisy >= 0).all(), "negative count leaked past the clamp"
    assert abs(engine.budget.epsilon_spent - 0.5) < 1e-9

    # 3. Smaller epsilon must increase expected drift (more privacy, less utility).
    coords = rng.normal(0.0, 0.05, size=(2000, 2))
    loose = SpatialDPEngine(PrivacyBudgetTracker(20.0), domain_diameter_deg=0.5)
    tight = SpatialDPEngine(PrivacyBudgetTracker(20.0), domain_diameter_deg=0.5)
    drift_loose = loose.displacement_report(coords, loose.local_dp_coords(coords, 3.0))
    drift_tight = tight.displacement_report(coords, tight.local_dp_coords(coords, 1.0))
    assert drift_tight["mean_drift"] > drift_loose["mean_drift"]

    # 4. Budget exhaustion is a hard failure, never a silent pass.
    small = SpatialDPEngine(PrivacyBudgetTracker(epsilon_total=1.0))
    small.central_dp_counts(counts, eps=0.9)
    try:
        small.central_dp_counts(counts, eps=0.5)
        raise AssertionError("over-spend was not rejected")
    except BudgetExhaustedError:
        pass

    print("spatial DP engine: all assertions passed")


if __name__ == "__main__":
    _run_validation()

If assertion 1 ever flips, a refactor has quietly equated the two models’ sensitivity — the most common way a “local” deployment silently degrades into central-strength noise while still claiming an untrusted-curator guarantee. Treat that assertion as a regression gate in CI, not a one-off check.

Incident Response & Edge Cases

Sub-meter precision detected at ingest. A device emits coordinates at 0.000001° (~0.1 m) and the local clip domain is wide enough to pass them through almost untouched, so the per-record noise no longer hides the original cell. Remediation: tighten domain_diameter_deg to the actual operating area, reject records whose source precision exceeds the configured floor (logging the rejection without the raw coordinate), and re-bin at a coarser H3 resolution before release.
Budget exhausted mid-query. An iterative spatial join requests more ε than the tracker holds and allocate raises BudgetExhaustedError. Do not catch-and-continue. Halt the join, switch to a pre-computed anonymized aggregate or a synthetic fallback, and resume only after the accounting window resets — the same posture used when a secure-computation handshake stalls.
Utility collapse in the local model. A sparse rural grid noised at local ε = 1.0 produces p95 drift far past the 500 m tolerance, so downstream routing becomes unusable. Remediation: coarsen grid resolution (H3 7 → 6) to raise bin cardinality, or escalate to the central model if a trusted enclave is in fact available — re-running the privacy model comparison decision with the observed density.
k-anonymity violation after noising. Boundary artifacts and sparse cells let a noised release still expose individuals, a leakage vector catalogued in threat mapping for GIS data. Run the Python implementation of spatial threat modeling as a pre-release gate, suppress any cell below the floor, and log budget spend, drift metrics, and suppression triggers to an immutable audit trail.

These triggers feed directly into compliance framework mapping: HIPAA Safe Harbor for granular location data is typically met with central DP at ε ≤ 1.0 plus an H3 resolution that holds k ≥ 5, while GDPR Article 25 data-protection-by-design favours the local model precisely because raw coordinates never leave the device. The sector-specific translation of those constraints is worked end to end in mapping HIPAA requirements to geospatial datasets. The choice between central and local DP is therefore workload-dependent, not binary: central excels at high-fidelity analytics behind a trusted curator, local is indispensable for edge-native and multi-party spatial computation, and only continuous calibration plus the assertions above keep either honest under live query load.