Privacy Model Comparison for Spatial Analytics

Positioned under: Core Fundamentals & Architecture for Spatial Privacy

Choosing a privacy mechanism for a geospatial pipeline is a routing decision, not a preference: the same trajectory stream that tolerates a Gaussian-mechanism release for an epidemiological heat map must be handled by secure aggregation when those coordinates belong to financial customers who cannot legally cross a silo boundary. This guide gives privacy engineers and GIS data scientists a deterministic, stepwise framework for comparing differential privacy (DP), federated learning (FL), secure multi-party computation (MPC), and homomorphic encryption (HE) for location-aware workloads — quantifying sensitivity first, then matching the trust model and utility budget to a mechanism, parameterizing it, and gating every release behind measurable validation. The procedure binds each choice to a concrete parameter (an epsilon ceiling, an L2 clip norm, a secret-sharing threshold) rather than leaving the selection to intuition.

The worked scenario throughout is a cross-silo mobility analytics platform: three hospital networks and one transit authority want a joint origin-destination matrix without any party revealing raw GPS pings. That single scenario exercises every mechanism — central DP for the trusted-curator layer, local DP at the device, FL for model training that never moves raw data, and MPC/HE for the joins that cannot tolerate even perturbed leakage.

Prerequisites

Provision the following so each comparison is reproducible and auditable rather than anecdotal:

Python 3.11+ with numpy and scipy for noise calibration and distortion metrics, geopandas and shapely for geometry handling, and cryptography for the key material referenced in the MPC path.
A privacy-budget accounting method. This page assumes a Gaussian-mechanism (ε, δ) accountant with δ ≤ 1e-5; use Rényi differential privacy (RDP) composition when the same dataset is queried repeatedly. The accountant must persist spent budget to durable storage so a model release and a tabular release cannot each spend the full budget in isolation.
A common coordinate reference system (CRS). Reproject every input to a single projected CRS (for example the local UTM zone) before any sensitivity or distortion calculation — distance-based error in degrees is meaningless and silently corrupts the utility comparison.
A scored sensitivity baseline. Mechanism selection is driven by the composite risk weights produced by spatial sensitivity scoring models; have that scoring routine importable, because its per-feature score is the primary input to budget allocation below.
A documented threat surface. The inference vectors enumerated by threat mapping for GIS data determine which mechanisms are even admissible — a pipeline exposed to cross-silo linkage cannot be solved by local noise alone.

Decision Criteria: Trust Model versus Utility

The comparison turns on three axes — who is trusted to see raw data, how much spatial error the downstream task tolerates, and how much compute and coordination cost is acceptable. The matrix below is the contract the rest of the procedure enforces; every row terminates in a concrete parameter, not a vague preference.

Mechanism	Trust assumption	Spatial utility	Cost / coordination	Governing parameter
Central DP	Trusted curator aggregates raw data	High (low variance)	Low	Global `ε` small (e.g. 0.5–1.0), `δ ≤ 1e-5`
Local DP	No trusted party; noise at source	Low (high variance)	Low	Per-report `ε` large; variance ∝ 1/`ε`²
Federated learning + DP-SGD	Raw data never leaves the silo	Medium–high for models	Medium (round coordination)	Clip norm `L2`, noise multiplier, participant quorum
MPC / HE	No party sees any other’s raw input	Exact (no perturbation)	High (crypto + bandwidth)	Secret-sharing threshold `t`-of-`n`, key rotation interval

For the device-versus-curator decision specifically — coordinate perturbation, radius inflation, and grid-based generalization under each trust model — work the trade-off through comparing central versus local differential privacy for GIS before committing an epsilon budget. When a row resolves to MPC or HE, the concrete protocols live in secret sharing for coordinates and homomorphic encryption basics.

Step 1: Quantify Baseline Sensitivity and Map Attack Surfaces

Before evaluating any mechanism, establish a quantitative baseline for location exposure. Ingest raw coordinate streams, trajectory logs, and polygon boundaries into a staging environment and classify them with a standardized scoring matrix that weighs re-identification risk, temporal granularity, and contextual linkage potential. The composite weights from the sensitivity scoring routine dictate downstream privacy budgets: high-risk geometries (dense quasi-identifiers, sub-meter resolution, daily sampling) demand a tighter ε or outright cryptographic routing, while coarse, infrequently sampled layers tolerate lightweight perturbation.

python

import numpy as np
from typing import Dict

def derive_budget_from_sensitivity(
    sensitivity_score: float,
    base_epsilon: float = 1.0,
    mpc_threshold: float = 0.8,
) -> Dict[str, float | str]:
    """Map a composite sensitivity score (0..1) to a privacy budget and route.

    High scores tighten epsilon and, past a threshold, divert the feature to a
    cryptographic path where no perturbed raw value is emitted at all.
    """
    score = float(np.clip(sensitivity_score, 0.0, 1.0))
    if score >= mpc_threshold:
        return {"route": "mpc", "epsilon": 0.0}
    # Inverse mapping: more sensitive -> smaller epsilon -> more noise.
    epsilon = base_epsilon * (1.0 - score) + 0.05
    return {"route": "dp", "epsilon": round(epsilon, 4)}

Concurrently, overlay these weights against the inference vectors documented during threat mapping for GIS data — trajectory reconstruction, centroid triangulation, and spatiotemporal correlation. This ensures mechanism selection mitigates high-probability attack paths instead of applying uniform noise across low-risk geometry. Catalog each linkage risk (for example combining GPS pings with public POI databases) and assign a threat-severity tier before parameterizing.

Step 2: Select and Parameterize the Mechanism

With sensitivity baselines and threat surfaces documented, select the architecture. Central DP yields higher spatial utility but requires a trusted aggregation node; local DP moves the guarantee to the data origin at the cost of variance that grows with the square of the inverse epsilon. When regulation prohibits raw coordinate transmission at all — HIPAA Safe Harbor for clinical mobility, GLBA routing for financial location — the decision moves to federated training or an MPC/HE join. Parameterize the chosen model explicitly:

Privacy budgets: (ε, δ) for DP, or a t-of-n secret-sharing threshold for MPC.
Clipping thresholds: an L2 norm that bounds the spatial sensitivity each record can contribute.
Aggregation windows: temporal buckets that respect autocorrelation in movement data, so consecutive pings are not treated as independent draws.

python

from typing import Literal

def select_mechanism(
    *,
    trusted_curator: bool,
    raw_may_leave_silo: bool,
    utility_floor_high: bool,
) -> Literal["central_dp", "local_dp", "federated", "mpc"]:
    """Deterministic mechanism selection from trust and utility constraints."""
    if not raw_may_leave_silo:
        # Cannot move raw data: train in place, or compute under encryption.
        return "federated" if utility_floor_high else "mpc"
    if trusted_curator and utility_floor_high:
        return "central_dp"
    return "local_dp"


assert select_mechanism(trusted_curator=True, raw_may_leave_silo=True,
                        utility_floor_high=True) == "central_dp"
assert select_mechanism(trusted_curator=False, raw_may_leave_silo=False,
                        utility_floor_high=False) == "mpc"

When the selection lands on federated training, the round mechanics — participant sampling, clipping, and noise addition — are governed by the gradient aggregation techniques used across the federated learning workflows for geospatial data, and the same shared accountant must bound that model release.

Step 3: Implement Calibration and Secure Aggregation

Production deployments require deterministic noise calibration and a secure aggregation path. The evaluator below parameterizes, applies, and validates the three perturbation-based mechanisms (central DP, local DP, federated averaging) so they can be compared on identical input. The Gaussian scale is the standard analytic bound, σ = Δ · sqrt(2 · ln(1.25 / δ)) / ε, where the L2 sensitivity Δ = 2 · clip_norm; reducing ε widens the noise and tightening δ raises it slightly.

python

import numpy as np
from typing import Dict, List
from scipy.spatial.distance import cdist


class SpatialPrivacyEvaluator:
    """Compare DP, FL, and MPC-adjacent spatial mechanisms with utility validation."""

    def __init__(self, epsilon: float = 1.0, delta: float = 1e-5,
                 clip_norm: float = 10.0, grid_resolution: float = 0.01) -> None:
        self.epsilon = epsilon
        self.delta = delta
        self.clip_norm = clip_norm
        self.grid_res = grid_resolution

    def _clip_coordinates(self, coords: np.ndarray) -> np.ndarray:
        """L2-norm clipping to bound each record's spatial sensitivity."""
        norms = np.linalg.norm(coords, axis=1, keepdims=True)
        scale = np.where(norms > 0, np.minimum(1.0, self.clip_norm / norms), 1.0)
        return coords * scale

    def apply_central_dp(self, coords: np.ndarray) -> np.ndarray:
        """Gaussian mechanism for a trusted-curator aggregate."""
        sensitivity = 2 * self.clip_norm
        sigma = np.sqrt(2 * np.log(1.25 / self.delta)) * sensitivity / self.epsilon
        return self._clip_coordinates(coords) + np.random.normal(0, sigma, coords.shape)

    def apply_local_dp(self, coords: np.ndarray) -> np.ndarray:
        """Laplace mechanism applied at the data origin (no trusted party)."""
        sensitivity = 2 * self.clip_norm
        b = sensitivity / self.epsilon
        return self._clip_coordinates(coords) + np.random.laplace(0, b, coords.shape)

    def simulate_federated_aggregation(self, local_coords: List[np.ndarray]) -> np.ndarray:
        """Secure-aggregation-style mean with calibrated post-aggregation noise.

        In production, replace the plain mean with a SecAgg protocol so the
        server never sees an individual client's clipped vector.
        """
        clipped = [self._clip_coordinates(c) for c in local_coords]
        aggregated = np.mean(clipped, axis=0)
        sensitivity = 2 * self.clip_norm / len(local_coords)
        sigma = np.sqrt(2 * np.log(1.25 / self.delta)) * sensitivity / self.epsilon
        return aggregated + np.random.normal(0, sigma, aggregated.shape)

    def validate_spatial_utility(self, original: np.ndarray,
                                 protected: np.ndarray) -> Dict[str, float]:
        """Spatial distortion metrics for compliance reporting."""
        hausdorff = np.max(np.min(cdist(original, protected), axis=1))
        mean_euclidean = np.mean(np.linalg.norm(original - protected, axis=1))
        variance_ratio = (np.var(protected) / np.var(original)
                          if np.var(original) > 0 else 0.0)
        return {
            "hausdorff_dist": float(hausdorff),
            "mean_euclidean_error": float(mean_euclidean),
            "variance_preservation_ratio": float(variance_ratio),
        }


# --- Validation harness -----------------------------------------------------
if __name__ == "__main__":
    rng = np.random.default_rng(42)
    pts = rng.normal(0, 5, size=(500, 2))

    tight = SpatialPrivacyEvaluator(epsilon=0.5)
    loose = SpatialPrivacyEvaluator(epsilon=5.0)

    # A smaller epsilon must inject more error than a larger one.
    err_tight = tight.validate_spatial_utility(pts, tight.apply_central_dp(pts))
    err_loose = loose.validate_spatial_utility(pts, loose.apply_central_dp(pts))
    assert err_tight["mean_euclidean_error"] > err_loose["mean_euclidean_error"]

    # Local DP (Laplace at source) is noisier than central DP at equal epsilon.
    e = SpatialPrivacyEvaluator(epsilon=1.0)
    central = e.validate_spatial_utility(pts, e.apply_central_dp(pts))
    local = e.validate_spatial_utility(pts, e.apply_local_dp(pts))
    assert local["mean_euclidean_error"] > 0 and central["mean_euclidean_error"] > 0

    # Federated averaging over many clients concentrates the estimate.
    clients = [rng.normal(0, 5, size=(50, 2)) for _ in range(20)]
    agg = e.simulate_federated_aggregation(clients)
    assert agg.shape == (50, 2)
    print("ok", round(err_tight["mean_euclidean_error"], 2),
          round(err_loose["mean_euclidean_error"], 2))

For the workloads that resolve to the cryptographic path rather than perturbation, the coordinates are masked or secret-shared instead of noised — the implementation patterns are covered in coordinate masking protocols and the broader secure multi-party computation in spatial analytics section.

Step 4: Validate Spatial Utility and Enforce Compliance Boundaries

After applying any transformation, validate the result against operational thresholds using the metrics from validate_spatial_utility. Healthcare telemetry tolerates higher distortion — mean_euclidean_error < 50 m is typically acceptable for epidemiological clustering — while financial routing and emergency dispatch require sub-10 m precision. Bind the distortion budget to a regulatory clause rather than leaving it as a note:

HIPAA Safe Harbor: remove geographic subdivisions smaller than a state, or apply DP with ε ≤ 1.0 for granular location; collapse 3-digit ZIPs whose population falls below 20,000.
GLBA / FFIEC: protect customer routing with secure aggregation or MPC so no raw coordinate crosses a trust boundary; the comparison must reject any DP-only answer for these rows.
GDPR Article 25: data protection by design — local DP or a federated architecture minimizes transfer scope, with grid snapping at ingestion (cell edge ≥ 250 m in residential zones).

The full clause-to-parameter binding, including the precise grid resolutions and retention windows each statute implies, lives in the compliance framework mapping. If validation metrics exceed the acceptable distortion limit, trigger a fallback that degrades resolution (point coordinates down to a hexagonal grid) or routes the query through an anonymized proxy until the utility threshold is restored — degrade utility, never confidentiality.

Step 5: Stress-Test Against Adaptive Adversaries

Static budgets degrade under adaptive adversaries, so the comparison is not finished until the chosen mechanism survives simulated attack. Replay model inversion, membership inference, and side-channel leakage against the perturbed or aggregated output, then recalibrate ε and δ based on observed query patterns. When a threat score crosses its tolerance, degrade gracefully: isolate high-sensitivity queries, tighten the clip norm, or divert the workload to an air-gapped MPC enclave. When node latency or dropout threatens a federated round, defer to the async execution patterns that handle staleness without stalling aggregation.

Threat Model Considerations

The comparison is only credible against a named adversary. Evaluate every candidate mechanism for resistance to at least the following:

Temporal linkage. Correlating perturbed coordinates across sequential timestamps reconstructs home/work anchors even after per-point noise. Defense: apply correlated noise or route trajectories to secure aggregation; central DP on independent points does not address this.
Auxiliary data fusion. An attacker joins noisy outputs against transit schedules, satellite imagery, or POI databases to narrow uncertainty. Defense: enforce the generalization floor from the mapping table and validate empirical re-identification, not nominal k-anonymity.
Membership inference on models. When location data feeds a federated model, an adversary probes whether a subject participated. Defense: the shared (ε, δ) accountant must bound the model release, not only the tabular one.
Aggregation-node compromise. A curator or MPC participant turns malicious. Defense: prefer a t-of-n secret-sharing threshold that tolerates partial collusion, and assess FL poisoning resilience under a minority of adversarial clients.
Budget-exhaustion abuse. Repeated queries average away the noise. Defense: persist spent budget durably and reject queries once the per-subject ledger crosses its ceiling.

Validation & Compliance Checklist

Gate every comparison decision behind measurable controls; each must produce an explicit pass/fail:

Mechanism justified. The selected mechanism traces to a row in the decision matrix and to the sensitivity score that drove it.
Budget ledger. Cumulative spent ε ≤ the per-dataset ceiling and per-subject ε ≤ the documented bound, read from durable storage; fail closed if unreachable.
Delta bound. Effective δ ≤ 1e-5 for every mechanism in the composition graph.
Utility floor. mean_euclidean_error and Hausdorff distance fall within the downstream task’s tolerance (e.g. < 50 m clinical, < 10 m dispatch).
Trust-boundary check. No raw coordinate crosses a silo boundary for any GLBA/CCPA-flagged subject; such joins ran via MPC or HE.
Empirical re-identification. A scripted linkage + trajectory-reconstruction attack yields re-identification probability below the regulatory threshold (e.g. < 0.09 for HIPAA Expert Determination).
Adversarial replay. Membership-inference and inversion simulations against the released output stay below the agreed advantage bound.

Failure Modes & Remediation

Even a correct selection fails in production. Plan the recovery path for each:

Privacy budget exhaustion. The ledger hits the ε ceiling mid-period and queries return useless noise. Remediation: serve pre-aggregated cached layers, coarsen the grid (larger cells cost less budget), and roll to the next accounting period only on a documented schedule — never silently reset the ledger.
CRS mismatch. An asset arrives in a geographic CRS while the pipeline assumes a projected one, so distortion metrics and grid snapping are wrong. Remediation: reject any input lacking an explicit CRS tag; reproject and re-score before comparing mechanisms.
Node dropout during a federated round. A participant disappears mid-aggregation, biasing the SecAgg result. Remediation: require a minimum quorum before accepting an aggregate and re-weight contributions; defer to the federated workflows for staleness handling.
Cryptographic latency spike. The MPC/HE path slows under load and analysts are tempted to bypass it. Remediation: a circuit breaker routes overflow to elevated-noise pre-aggregated grids, never to the raw path.
Variance collapse from over-clipping. An aggressive L2 clip norm destroys the spatial signal you meant to preserve. Remediation: tune the clip norm against the variance_preservation_ratio metric, not by default.

Frequently Asked Questions

When should I choose MPC or homomorphic encryption over differential privacy?

Choose the cryptographic path whenever the trust model forbids any party from seeing another’s raw input and the task needs an exact answer — typically cross-silo joins under GLBA or contractual data-sharing limits. DP trades a bounded leakage for low cost and high utility; MPC and HE trade compute and bandwidth for zero perturbation. If even a perturbed raw value crossing the boundary is unacceptable, DP alone does not qualify.

Why is local differential privacy so much noisier than central DP at the same epsilon?

Under local DP every record is perturbed independently at its source before aggregation, so the noise variance compounds across the dataset and scales with the inverse square of epsilon. Central DP perturbs once, after a trusted curator has aggregated, so the same epsilon buys far lower variance. The detailed error analysis and the grid-resolution trade-offs are worked through in comparing central versus local differential privacy for GIS.

How do I keep a federated model and a tabular release from double-spending the budget?

Share a single durable (ε, δ) accountant across every job that touches the dataset. Both the federated learning workflows that train models and the tabular DP releases must debit the same ledger, so concurrent jobs cannot each spend the full budget. Persist spent budget to storage, not process memory, and fail closed when the ledger is unreachable.

Where do I set the epsilon ceiling for spatial data?

Derive it from the strictest applicable row in the compliance framework mapping, then divide across expected query volume with your accountant. A common start is a per-subject ε of 0.5 to 1.0 per release period, tightened until the empirical re-identification check passes against your simulated linkage attack.

Comparing central versus local differential privacy for GIS — the device-versus-curator trade-off in depth.
Spatial sensitivity scoring models — the risk scoring that feeds mechanism selection.
Threat mapping for GIS data — the attack surfaces a mechanism must mitigate.
Compliance framework mapping — clause-to-parameter bindings for the validation stage.
Secret sharing for coordinates — the secure path for cross-silo joins that cannot perturb raw data.

Up: Core Fundamentals & Architecture for Spatial Privacy