Python Implementation of Spatial Threat Modeling

Positioned under: Threat Mapping for GIS Data

This page turns the adversarial surface catalogued in threat mapping for GIS data into a single, runnable Python component that a privacy engineer can drop in front of a release boundary. It assumes the foundations established in Core Fundamentals & Architecture for Spatial Privacy — that geospatial telemetry is a re-identification vector, that sensitivity must be quantified before noise is chosen, and that every release is gated, not trusted. The component does four things in order: calibrate a per-zone privacy budget against density, clamp coordinate precision, securely aggregate spatial gradients across nodes, and prove that the perturbed output still preserves cluster structure before it leaves the process. Where the broader privacy model comparison decides which mechanism to apply, this implementation is the concrete enforcement layer once differential privacy plus secure aggregation has been selected.

Parameter Configuration & Calibration

Every tunable below maps to a concrete failure if it is left at a guessed default. Calibrate them against the same spatial sensitivity score that drives routing, not against intuition.

Base epsilon and density scaling. A static epsilon across heterogeneous zones over-perturbs sparse rural cells (destroying utility) while under-protecting dense urban cores. Scale ε by log1p(density) so crowded cells — which already supply natural k-anonymity — keep a larger budget and sparse cells receive the smallest. Bound the result to [0.1, 5.0] so no zone is ever effectively un-noised.
Coordinate precision floor. Truncate to 6 decimal degrees (~0.11 m) at ingest; anything finer is a precision leak that survives downstream aggregation. The fallback grid of 0.01° (~1 km) is the coarsening target when sub-meter precision is detected in non-critical telemetry.
L2 clip norm. Gradient inversion can reconstruct approximate coordinates from un-clipped model updates. A clip norm of 1.0 bounds each contributor’s influence and makes the Gaussian-mechanism sensitivity exactly clip_norm / n for n disjoint nodes.
k-anonymity floor. k ≥ 50 for healthcare telemetry and k ≥ 20 for financial telemetry are the suppression thresholds enforced after noising — these bind HIPAA Safe Harbor and sector data-minimisation duties to a measurable cell cardinality, as worked out in the compliance framework mapping.
Autocorrelation tolerance. A Moran’s I deviation budget of 0.15 between original and perturbed data ensures noise injection does not artificially fragment a clinically or financially significant cluster.

Reference Implementation

The class below consolidates calibration, precision enforcement, secure aggregation, and the autocorrelation gate into one auditable unit. Every method carries type annotations and inline notes on its privacy implication, and the block ends with a runnable validation harness so the contract can be wired into CI as a regression gate.

python

from __future__ import annotations

from typing import Dict, Iterable

import numpy as np


class SpatialThreatModel:
    """Release-boundary guard for privacy-preserving GIS pipelines.

    Combines density-aware epsilon calibration, coordinate-precision
    clamping, L2-clipped secure gradient aggregation, and a spatial
    autocorrelation stability gate. Every output that crosses a
    compliance boundary is expected to pass `is_release_safe`.
    """

    def __init__(
        self,
        base_epsilon: float = 1.0,
        min_epsilon: float = 0.1,
        max_epsilon: float = 5.0,
        max_precision: int = 6,
        clip_norm: float = 1.0,
        autocorr_tolerance: float = 0.15,
    ) -> None:
        self.base_epsilon = base_epsilon
        self.min_epsilon = min_epsilon
        self.max_epsilon = max_epsilon
        self.max_precision = max_precision  # ~0.11 m at 6 dp
        self.clip_norm = clip_norm
        self.autocorr_tolerance = autocorr_tolerance

    def dynamic_epsilon(self, population_density: np.ndarray) -> np.ndarray:
        """Scale epsilon by density.

        Dense cells get a larger epsilon (less noise — crowds already
        give k-anonymity); sparse cells get a smaller epsilon (more
        noise — re-identification risk is highest there).
        """
        density = np.asarray(population_density, dtype=float)
        # log1p(0) == 0, so guard the denominator before normalising.
        scaling = np.log1p(density) / np.log1p(float(np.max(density)) + 1.0)
        eps = self.base_epsilon * scaling
        return np.clip(eps, self.min_epsilon, self.max_epsilon)

    def clamp_precision(self, coords: np.ndarray) -> np.ndarray:
        """Truncate lon/lat to the configured precision floor.

        Sub-meter precision in non-critical telemetry is a leak that
        survives aggregation, so it is rounded away before any threat
        logic runs.
        """
        coords = np.asarray(coords, dtype=float)
        if coords.ndim != 2 or coords.shape[1] != 2:
            raise ValueError("coords must be an (N, 2) lon/lat array")
        return np.round(coords, self.max_precision)

    def secure_gradient_aggregate(
        self,
        local_gradients: Dict[str, np.ndarray],
        epsilon: float,
        delta: float = 1e-5,
    ) -> np.ndarray:
        """L2-clip, average, then add Gaussian noise for (epsilon, delta)-DP.

        Clipping bounds any single node's pull on the mean — the core
        defence against gradient inversion and aggregation poisoning in
        a federated round.
        """
        clipped = []
        for grad in local_gradients.values():
            norm = float(np.linalg.norm(grad))
            if norm > self.clip_norm:
                grad = grad * (self.clip_norm / norm)
            clipped.append(np.asarray(grad, dtype=float))

        n = max(len(clipped), 1)
        aggregated = np.sum(clipped, axis=0) / n

        # Sensitivity of the mean over n disjoint contributors.
        sensitivity = self.clip_norm / n
        sigma = (sensitivity * np.sqrt(2.0 * np.log(1.25 / delta))) / epsilon
        return aggregated + np.random.normal(0.0, sigma, size=aggregated.shape)

    @staticmethod
    def _morans_i_proxy(coords: np.ndarray, values: np.ndarray) -> float:
        """Cheap spatial autocorrelation proxy in roughly [-1, 1].

        For production gating substitute `esda.Moran(values, weights)`;
        this closed form compares each value to its 3 nearest
        neighbours and avoids a heavy dependency in the hot path.
        """
        from scipy.spatial import cKDTree

        values = np.asarray(values, dtype=float)
        if len(values) < 4:
            return 0.0
        tree = cKDTree(coords)
        _, idx = tree.query(coords, k=4)  # self + 3 neighbours
        neighbour_means = values[idx[:, 1:]].mean(axis=1)
        centered = values - values.mean()
        neighbour_centered = neighbour_means - values.mean()
        denom = float((centered ** 2).sum())
        if denom == 0.0:
            return 0.0
        return float((centered * neighbour_centered).sum() / denom)

    def is_release_safe(
        self,
        original_coords: np.ndarray,
        perturbed_coords: np.ndarray,
        values: np.ndarray,
    ) -> bool:
        """Gate a release on cluster-structure preservation.

        Returns True only if perturbation kept Moran's I within the
        tolerance band — i.e. the noise did not shred clinically or
        financially meaningful clusters.
        """
        before = self._morans_i_proxy(self.clamp_precision(original_coords), values)
        after = self._morans_i_proxy(self.clamp_precision(perturbed_coords), values)
        return abs(before - after) <= self.autocorr_tolerance


def _run_validation() -> None:
    rng = np.random.default_rng(7)
    model = SpatialThreatModel()

    # 1. Sparse zones must never receive more budget than dense zones.
    density = np.array([1.0, 10.0, 1000.0])
    eps = model.dynamic_epsilon(density)
    assert eps[0] <= eps[1] <= eps[2]
    assert np.all(eps >= model.min_epsilon) and np.all(eps <= model.max_epsilon)

    # 2. Precision is clamped to the configured floor.
    fine = np.array([[12.3456789, -1.2345678]])
    clamped = model.clamp_precision(fine)
    assert np.allclose(clamped, np.round(fine, 6))

    # 3. Tighter epsilon must inject strictly more noise.
    grads = {f"n{i}": rng.normal(size=8) for i in range(5)}
    spread = lambda e: float(np.std(model.secure_gradient_aggregate(grads, epsilon=e)))
    assert spread(0.5) > spread(5.0)

    # 4. A faithful (near-identity) perturbation passes the gate;
    #    destroying structure fails it.
    coords = rng.uniform(0, 1, size=(60, 2))
    values = coords[:, 0] + coords[:, 1]  # smooth, highly autocorrelated
    gentle = coords + rng.normal(0, 1e-4, size=coords.shape)
    assert model.is_release_safe(coords, gentle, values) is True
    assert model.is_release_safe(coords, rng.uniform(0, 1, size=coords.shape), values) is False

    print("SpatialThreatModel: all assertions passed")


if __name__ == "__main__":
    _run_validation()

Validation Checkpoint

Treat the harness above as a release gate, not a one-off smoke test. Each assertion encodes a privacy invariant that a refactor can silently break:

Budget monotonicity — dynamic_epsilon must keep sparse cells at or below the budget of denser cells; if this flips, the noisiest-needing zones are the least protected.
Precision flooring — coordinates round to the 6-decimal floor; a regression that widens this re-introduces a sub-meter fingerprint.
Noise/epsilon coupling — halving epsilon must measurably increase output variance; if it does not, the Gaussian mechanism has been miswired and the stated (ε, δ) guarantee is fictional.
Structure-preservation gate — a faithful perturbation passes and a structure-destroying one fails, so is_release_safe cannot rubber-stamp a release that has fragmented its clusters.

Run this in CI on every change to the noise, clipping, or calibration paths. For statistically rigorous gating in staging, swap _morans_i_proxy for esda.Moran over an explicit libpysal weights matrix and confirm perturbed Moran’s I stays within the confidence interval of the baseline.

Incident Response & Edge Cases

Sub-meter precision detected at ingest. A device emits coordinates at 0.000001° and they slip past validation into a threat score. Remediation: reject the record (logging the rejection without the raw coordinate), fall back to the 0.01° grid, and re-baseline the affected zone before re-admitting its feed.
Privacy budget exhausted mid-query. Cumulative spend crosses 80% of the quarterly allocation during an iterative spatial join. Do not catch-and-continue: halt non-essential joins, route reads to pre-computed anonymized aggregates, and resume only after the accounting window resets — the same posture used when a secure-computation handshake stalls. This failure mode is treated in depth across the secure multi-party computation workflows.
Cross-node correlation spike. Federated gradient similarity exceeds 0.95 across heterogeneous zones, signalling possible poisoning or a colluding node. Quarantine the suspect update, fall back to a trusted execution environment or a synthetic spatial proxy for that round, and trigger manual review. The aggregation-side defences for this live in the federated learning workflows — specifically handling non-IID geospatial data.
k-anonymity violation after noising. Boundary artifacts leave a noised cell below the k ≥ 50 (healthcare) or k ≥ 20 (financial) floor. Suppress the offending cell, recompute the spatial k-anonymity threshold, and escalate to the data governance board if suppression alone cannot restore the floor. Log threat scores, epsilon consumption, and suppression triggers to an immutable, salted-hash audit trail — never the raw coordinates.

These triggers feed directly into compliance framework mapping: HIPAA Safe Harbor for granular location data is typically met by retaining only 3-digit ZIP prefixes plus central DP at ε ≤ 1.0, while GDPR Article 5(1)© minimisation is enforced through the precision floor and density-scaled budget above. The sector-specific translation is worked end to end in mapping HIPAA requirements to geospatial datasets.