Threat Mapping for GIS Data: A Privacy-First Engineering Guide

Positioned under: Core Fundamentals & Architecture for Spatial Privacy

Consider a concrete engineering scenario. A regional health system runs a spatial analytics pipeline that ingests patient mobility traces, clinic geofences, and pharmacy fulfilment events, then publishes neighbourhood-level utilisation dashboards to public-health partners. Every coordinate that flows through that pipeline is a potential re-identification surface: a single rural residence polygon joined against a voter roll can unmask an individual even after the obvious identifiers are stripped. Threat mapping for geospatial information systems is the discipline of enumerating those surfaces before data moves, binding each one to a concrete control, and proving — with an audit trail — that the control held. This guide walks privacy engineers, GIS data scientists, and regulated healthcare and finance teams through that workflow as a repeatable, instrumented pipeline rather than a one-off review.

The map you build here is the input to the rest of the architecture. The attack surface enumerated below sets the weights used by the spatial sensitivity scoring models; the resulting risk tier selects a protection mechanism through the privacy model comparison; and each control is bound to a regulatory parameter via the compliance framework mapping. Threat mapping, scoring, and routing form a single ingestion-to-control loop.

Prerequisites

Before building the threat map, provision a reproducible environment and agree on the accounting conventions every stage will reference:

Python libraries. geopandas and shapely for geometry handling, numpy for noise mechanisms, h3 (or pygeohash) for discrete global grid indexing, and pandas for the asset ledger. Pin versions in a lockfile so sensitivity scores are reproducible across runs.
Cryptographic dependencies. A secure-aggregation or homomorphic-encryption backend for high-risk routing — for example a Paillier or CKKS library, or an MPC runtime. The implementation here treats these as a routing target; the protocol details live in secure multi-party computation in spatial analytics and the secret sharing for coordinates guide.
Privacy budget accounting. Decide the composition method up front. This guide assumes a per-asset $(\varepsilon, \delta)$ ledger under basic sequential composition, where the total spend $\varepsilon_{\text{total}} = \sum_i \varepsilon_i$ must stay below a published cap. If you run many adaptive queries, switch the ledger to Rényi differential privacy accounting before scaling out — the threat map does not change, only the bookkeeping does.
A canonical CRS. Fix one projected coordinate reference system (for example a national grid in metres) for all distance- and density-based scoring, so sensitivity is computed in real-world units rather than degrees.

Step-by-Step Procedure

Step 1: Asset inventory and spatial sensitivity classification

Catalogue every geospatial asset entering the analytical environment — vector layers, raster tiles, trajectory datasets, and derived spatial indices — and assign each a baseline risk tier. Document the coordinate reference system, temporal resolution, attribute cardinality, and provenance lineage, because downstream controls reference exact spatial boundaries and temporal windows. High-sensitivity geometries such as patient residence polygons or financial branch geofences are flagged for cryptographic isolation before they touch any shared computation. The composite score that drives this tiering is produced by the spatial sensitivity scoring models; threat mapping supplies the weights by declaring which attack surfaces are live for each asset.

python

from __future__ import annotations
import math
from dataclasses import dataclass
from shapely.geometry import Point, Polygon

@dataclass
class SpatialAsset:
    asset_id: str
    geometry: Point | Polygon
    crs: str                      # canonical projected CRS, e.g. "EPSG:3035"
    temporal_resolution: float    # seconds between successive samples
    attribute_cardinality: int    # count of quasi-identifying attributes
    context_exposure: float       # 0.0 - 1.0, adjacency to sensitive POIs

def baseline_tier(asset: SpatialAsset) -> str:
    """Coarse triage before full scoring; finer geometries score higher."""
    fine_temporal = asset.temporal_resolution < 60.0
    point_geometry = isinstance(asset.geometry, Point)
    if asset.context_exposure > 0.7 and (fine_temporal or point_geometry):
        return "high"
    if asset.context_exposure > 0.4 or fine_temporal:
        return "medium"
    return "low"

assert baseline_tier(
    SpatialAsset("clinic_geo_01", Point(0, 0), "EPSG:3035", 15.0, 120, 0.85)
) == "high"

Step 2: Threat vector enumeration per asset

Map specific attack surfaces to each classified asset rather than applying a uniform threat list. Linkage attacks exploit auxiliary geographic datasets (points of interest, parcel records, voter rolls); trajectory reconstruction stitches timestamp sequences back into a movement graph; proximity inference reads a sensitive facility from a generalised coordinate. In federated deployments the surface widens to gradient inversion on model updates — covered in depth under gradient aggregation techniques — and map-matching that snaps perturbed points back onto a road network. Enumerate these per asset so the protection mechanism is sized to the real threat, not a worst-case default.

python

def enumerate_threat_vectors(asset: SpatialAsset) -> list[str]:
    """Return the live attack surfaces implied by an asset's properties."""
    vectors: list[str] = []
    if asset.temporal_resolution < 60.0:
        vectors.append("trajectory_reconstruction")
    if asset.attribute_cardinality > 50:
        vectors.append("linkage_via_auxiliary")
    if isinstance(asset.geometry, Point):
        vectors.append("proximity_inference")
    if asset.context_exposure > 0.6:
        vectors.append("map_matching")
    return vectors

Step 3: Spatially-calibrated differential privacy and secure routing

Calibrate noise to the spatial operation, not a global constant. A differential privacy mechanism that respects spatial autocorrelation injects less noise where crowd density already supplies k-anonymity and more where sparsity raises re-identification risk. For a Laplace mechanism the scale follows $b = \Delta f / \varepsilon$ , so a smaller per-cell $\varepsilon$ in sparse regions widens the noise distribution exactly where it is needed. Assets that score above the high-risk threshold bypass perturbation entirely and route into secure aggregation, where raw coordinates never leave an enclave or secret-shared form.

python

def calibrate_dp_noise(
    sensitivity: float,
    base_epsilon: float = 1.0,
    floor: float = 0.05,
) -> tuple[float, float]:
    """Map a sensitivity score to (epsilon, Laplace scale b = 1/epsilon).

    Higher sensitivity spends less epsilon (more noise). The floor preserves
    a minimum analytical utility so we never publish pure noise.
    """
    epsilon = max(base_epsilon * (1.0 - sensitivity), floor)
    noise_scale = 1.0 / epsilon
    return round(epsilon, 4), round(noise_scale, 4)

def select_route(sensitivity: float, threshold: float = 0.6) -> str:
    return "secure_aggregation" if sensitivity > threshold else "federated_dp"

eps_low, b_low = calibrate_dp_noise(0.2)
eps_high, b_high = calibrate_dp_noise(0.9)
assert b_high > b_low            # sparse/high-risk cell gets wider noise
assert select_route(0.9) == "secure_aggregation"

Step 4: Assemble and validate the threat assessment

Compose the steps into a single assessment object that records the sensitivity score, the spent privacy budget, the live threat vectors, and the routing decision for every asset. The assertions at the end are the runnable validation harness: they reject any assessment that drops below the utility floor or emits an undefined route. For extended cryptographic routing logic, homomorphic aggregation templates, and production deployment patterns, hand the assessment object to the Python implementation of spatial threat modeling.

python

from __future__ import annotations
import logging
from dataclasses import dataclass

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

@dataclass
class ThreatAssessment:
    asset_id: str
    sensitivity_score: float
    epsilon_budget: float
    noise_scale: float
    threat_vectors: list[str]
    routing_status: str

def compute_sensitivity(asset: SpatialAsset, cap: float = 1.0) -> float:
    """Composite score from temporal granularity, QI cardinality, and context."""
    granularity = min(1.0, 3600.0 / max(asset.temporal_resolution, 1.0))
    cardinality = min(1.0, math.log2(asset.attribute_cardinality + 1) / 8.0)
    score = 0.4 * asset.context_exposure + 0.3 * granularity + 0.3 * cardinality
    return min(score, cap)

def assess(asset: SpatialAsset) -> ThreatAssessment:
    sensitivity = compute_sensitivity(asset)
    epsilon, noise_scale = calibrate_dp_noise(sensitivity)
    return ThreatAssessment(
        asset_id=asset.asset_id,
        sensitivity_score=round(sensitivity, 4),
        epsilon_budget=epsilon,
        noise_scale=noise_scale,
        threat_vectors=enumerate_threat_vectors(asset),
        routing_status=select_route(sensitivity),
    )

if __name__ == "__main__":
    test_asset = SpatialAsset(
        asset_id="clinic_geo_01",
        geometry=Point(4_546_000, 3_205_000),  # projected metres (EPSG:3035)
        crs="EPSG:3035",
        temporal_resolution=15.0,
        attribute_cardinality=120,
        context_exposure=0.85,
    )
    result = assess(test_asset)
    logging.info("Assessment: %s", result)

    assert result.epsilon_budget >= 0.05, "DP budget below minimum utility floor"
    assert result.routing_status in {"secure_aggregation", "federated_dp"}
    assert "proximity_inference" in result.threat_vectors
    logging.info("Validation passed; asset routed to %s.", result.routing_status)

Threat Model Considerations

The adversary capabilities this map must cover are specific to geospatial pipelines and differ from tabular threat models:

Auxiliary linkage. The attacker holds a side dataset (parcels, business listings, voter rolls) and joins it spatially to re-identify generalised records. Mitigated by per-cell k-anonymity gates and grid snapping coarse enough to defeat the sharpest auxiliary layer you can foresee.
Trajectory reconstruction. Frequent, fine-grained timestamps let an adversary stitch sparse points into a continuous path whose home/work anchors are uniquely identifying. Mitigated by temporal binning and rolling suppression before any spatial release.
Map-matching on perturbed points. Noise that ignores the road or parcel topology is reversible: the attacker snaps each noisy point to the nearest plausible feature. Mitigated by calibrating noise to local feature density and validating that snapped outputs do not recover the original cell.
Gradient inversion. In federated training, raw model updates can be inverted to reconstruct the contributing coordinates. Mitigated by routing high-risk contributions through secure aggregation rather than clear-text gradients — see the federated learning workflows for geospatial data.
Metadata and side-channel leakage. Query response times, result-set sizes, and CRS metadata can leak neighbourhood-level structure even when records are masked. Mitigated by constant-time aggregation paths and by stripping provenance metadata from published artefacts.

Validation and Compliance Checklist

Each control has a measurable pass/fail criterion; treat any failure as a release blocker.

CRS and topology validation. Every input is reprojected to the canonical projected CRS before scoring or noise injection. Pass: zero records remain in a geographic (degree-based) CRS at the noise stage.
Epsilon ledger tracking. An immutable per-asset ledger records every $\varepsilon$ spend. Pass: cumulative $\varepsilon_{\text{total}}$ for any subject stays below the published cap (for example $\varepsilon_{\text{total}} \le 3.0$ per 24-hour window); queries that would exceed it are rejected.
k-anonymity gate. Every published spatial cell satisfies $k \ge k_{\min}$ (commonly $k_{\min} = 5$ for location data). Pass: no released cell aggregates fewer than $k_{\min}$ distinct subjects.
Regulatory parameter binding. Each control is tied to a concrete parameter through the HIPAA requirements mapping for geospatial datasets — for example Safe Harbor’s prohibition on geographic units below 20,000 population becomes a minimum grid-cell population. Pass: every published unit meets the bound.
Adversarial simulation. Linkage and trajectory-reconstruction attacks are run against sanitised outputs each release using current open standards such as the Open Geospatial Consortium (OGC) Standards. Pass: measured re-identification probability stays below the agreed threshold.
Routing attestation. High-risk assets are logged as entering secure aggregation, never clear-text DP. Pass: no asset with sensitivity > threshold appears in the federated_dp path.

Failure Modes and Remediation

Production threat-mapping pipelines fail in predictable ways; design the recovery path before launch.

Privacy budget exhaustion. A burst of adaptive queries drains the per-subject $\varepsilon$ ledger mid-session. Remediation: a circuit breaker halts further release, degrades to pre-computed grid-level aggregates, and resets only on the next budget window — never by silently raising the cap. Detection and recovery patterns are covered in the compliance framework mapping.
CRS mismatch. An upstream feed arrives in WGS84 degrees while scoring assumes projected metres, silently inflating or collapsing distances and corrupting every sensitivity score. Remediation: fail closed on a CRS assertion at ingestion; quarantine the batch rather than scoring it.
Secure-aggregation node dropout. A high-risk asset is routed to secure aggregation but a participating node becomes unavailable past its SLA. Remediation: fall back to a stricter all-or-nothing rule — hold the asset, do not downgrade it to clear-text DP. Asynchronous handling is described under async routing for MPC.
Over-perturbation. Global noise applied to sparse cells destroys topological integrity, and the dashboard becomes useless, tempting an operator to disable noise. Remediation: density-aware calibration plus a utility floor ( $\varepsilon \ge 0.05$ ) keeps sparse cells usable without removing protection.

Frequently Asked Questions

How is geospatial threat mapping different from tabular threat modeling?

Tabular models treat each column as an independent identifier. Spatial data is a joint distribution: a coordinate’s risk depends on resolution, temporal frequency, and contextual adjacency rather than on whether a name field is present. Threat mapping therefore enumerates surfaces — linkage, trajectory, proximity, map-matching — that have no tabular analogue, and sizes controls to local density rather than to the row schema.

How often should the threat map be regenerated?

Treat it as a living artefact, not a one-time review. Re-run enumeration whenever a new asset class, CRS, or auxiliary dataset enters scope, and at minimum on every release so the adversarial simulation in the checklist runs against the current output. The sensitivity weights it feeds into scoring shift as new attack surfaces appear.

Where does differential privacy stop and secure computation begin?

The routing threshold decides. Assets scoring below it are protected with spatially-calibrated DP noise, which preserves utility for aggregate dashboards. Assets above it carry too much re-identification risk to expose even in perturbed form, so they route into secure aggregation where raw coordinates stay secret-shared or enclave-bound. The privacy model comparison details the trade-offs that set that threshold.

Spatial sensitivity scoring models — the composite score this map supplies weights for.
Privacy model comparison — choosing DP, MPC, HE, or TEE per risk tier.
Compliance framework mapping — binding each control to a regulatory parameter.
Python implementation of spatial threat modeling — production routing and homomorphic aggregation templates.
Secure multi-party computation in spatial analytics — the protocols behind high-risk routing.

Up one level: Core Fundamentals & Architecture for Spatial Privacy.