Unsupervised Alignment of Geo-Embeddings for Rapid Disaster Mapping
Learning to align pre-computed geospatial embeddings with post-event satellite features for unsupervised disaster delineation without bi-temporal inference.
Requirements
- M.Sc. in Data Science, Computer Science, Artificial Intelligence, Mathematics, or similar
- Strong knowledge of Python and deep learning frameworks (PyTorch, Lightning)
- Basic knowledge of remote sensing and satellite imagery
- Basic concepts of self-supervised learning
- Understanding of foundation models and transfer learning
Description
Disaster event delineation, i.e. identifying the spatial footprint of floods, wildfires, landslides, and similar phenomena, typically relies on bi-temporal analysis, comparing satellite imagery acquired before and after the event. Running large foundation models on both timestamps is computationally expensive and often impractical for rapid-response scenarios where timely damage assessment is critical.
Recent geospatial foundation models such as TESSERA, AlphaEarth, and Clay now provide pre-computed annual embeddings that encode the “normal” state of geographic areas at high spatial resolution. These static embeddings aggregate multi-modal satellite observations (optical, radar, elevation) into compact per-pixel representations, capturing land cover, urban morphology, vegetation patterns, and other baseline characteristics without requiring on-the-fly inference.
This thesis proposes leveraging these pre-computed embeddings as a proxy for pre-event information, eliminating the need to process a pre-event image at inference time. Instead, only a single post-event image is passed through a lightweight encoder (e.g., the same FM family’s encoder or a small ViT) to extract event-specific features. The core challenge lies in the fact that the static embeddings and the post-event features inhabit different latent spaces: they differ in dimensionality, channel semantics, and the information encoded in each dimension. Naive operations such as direct differencing are therefore not applicable.
A further complication is that annual embeddings represent a temporal aggregate, not a snapshot of the pre-event state. Seasonal variation (e.g., bare fields in winter vs. green cover in summer) may introduce spurious change signals unrelated to the disaster. The thesis will need to investigate strategies to mitigate this temporal mismatch, such as injecting temporal metadata into the alignment module, leveraging seasonal composites where available, or learning season-invariant representations.
To bridge the gap between these heterogeneous representations, the thesis will investigate unsupervised alignment strategies. The alignment module will be trained on geographically co-located pairs from unchanged areas: a static embedding and features extracted from a normal-time image at the same location. This provides a self-supervised training signal: no event labels are required, only geographic co-location. Candidate alignment techniques include contrastive projection heads (InfoNCE), non-contrastive objectives (Barlow Twins, VICReg), and lightweight learned projections. Once aligned, meaningful change signals can be detected through simple operations such as cosine distance or learned differencing.
The primary investigation will focus on TESSERA embeddings and flood delineation, where benchmark data is most readily available. Extension to other embedding sources and hazard types (wildfire, landslide) will serve as secondary validation of the approach’s generalizability.
The research will address the following objectives:
- Geo-Embedding Analysis: Evaluating candidate pre-computed embedding sources (TESSERA, AlphaEarth, Clay) in terms of spatial resolution, dimensionality, temporal coverage, and information content.
- Alignment Module Design: Designing and comparing unsupervised alignment approaches (contrastive, non-contrastive, projection-based) to bridge the heterogeneous latent spaces.
- Pipeline Implementation: Building an end-to-end pipeline that takes a static geo-embedding and a single post-event image, produces aligned representations, and generates a change map.
- Evaluation: Benchmarking on flood delineation datasets, with extension to wildfire and landslide as secondary validation.
- Baseline Comparison: Comparing against bi-temporal FM baselines and traditional change detection methods in terms of accuracy, computational cost, and inference speed.
Main Activities
- Reviewing the literature on geospatial foundation models, pre-computed embeddings, and unsupervised representation alignment
- Selection and analysis of pre-computed embedding sources and post-event encoders
- Design and implementation of the alignment module and training pipeline
- Construction of training pairs from co-located unchanged areas
- Evaluation on flood delineation benchmarks
- Extension to wildfire and landslide delineation as secondary validation
- Analysis and comparison of computational cost against bi-temporal baselines