3D Urban Scene Synthesis from Multi-View Satellite Imagery

Synthesizing real-time, navigable 3D urban environments from multi-view satellite imagery using 3D Gaussian Splatting and generative refinement, with a focus on a case study in Turin.

Requirements

  • M.Sc. in Machine Learning, Data Science, Computer Science, Mathematics, Telecommunications, or similar
  • Good knowledge of Python
  • Software development skills
  • Basic concepts of image processing
  • Basic concepts of data science, concerning data analysis, data processing and deep learning

Description

Synthesizing large-scale, immersive, and geometrically accurate 3D urban scenes is a challenging task with crucial applications in urban planning, gaming, and robotics. While traditional 3D scanning is labor-intensive, satellite imagery provides extensive geographic coverage and automated collection. However, satellite data often lacks the parallax necessary to reconstruct building facades and street-level details accurately.

Inspired by the SkyFall-GS framework, this thesis proposes a two-stage pipeline for virtual city creation. The first stage involves coarse geometry reconstruction from multi-view satellite imagery using 3D Gaussian Splatting (3DGS). The second stage leverages open-domain text-to-image diffusion models to hallucinate realistic appearances in occluded areas, ensuring a strong satellite-to-ground 3D consistency. This research will focus on a case study in Turin, utilizing satellite imagery of the city to create a navigable and immersive 3D environment.

The main activities of the thesis include:

  • Reviewing the literature on 3D Gaussian Splatting, satellite-based 3D reconstruction, and diffusion-driven 3D refinement.
  • Exploring the available imagery for the Turin case study, identifying specific Areas of Interest (AOIs) with diverse architectural features.
  • Implementing initial 3DGS reconstruction, incorporating appearance modeling
  • Developing a curriculum-based refinement strategy to progressively enhance geometric completeness and texture realism from the sky to the ground.
  • Evaluating the performance against baseline 3D reconstruction methods using perceptual and pixel-level metrics.
  • Analyzing and visualizing the final 3D representation to demonstrate real-time, free-flight navigation of the synthesized Turin model.

At the end of the thesis, a paper publication to a conference is planned.

Contacts