Beyond the Canvas: A Systematic Review of Generative AI for Image Synthesis and Editing
Providing a comprehensive review of state-of-the-art image generative models, exploring architectural evolutions from GANs to Diffusion Models and hybrid systems, while analyzing evaluation paradigms and ethical challenges.
Requirements
- M.Sc. in Machine Learning, Data Science, Computer Science, Mathematics, Telecommunications, or similar
- Good knowledge of Python
- Software development skills
- Basic concepts of image processing
- Basic concepts of data science, concerning data analysis, data processing and deep learning
Description
Image synthesis has emerged as a pivotal technology in computer vision, spanning industries from digital art and entertainment to medical imaging and scientific research. The field has witnessed a rapid transition from early Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) to the current dominance of Diffusion Models (DMs) and emerging hybrid architectures. While these models enable the creation of high-fidelity, photorealistic images from textual prompts, they also introduce significant challenges in terms of training stability, computational complexity, and ethical safety.
This thesis focuses on providing a systematic and critical review of the generative AI landscape for images. The study will categorize existing work based on architectural foundations, such as the forward-reverse Markov chains in Denoising Diffusion Probabilistic Models (DDPMs) and the latent space efficiency of models like Stable Diffusion. Furthermore, the thesis will explore the nuances of human-AI co-creation and the evolving paradigms of image editing, which differ from traditional generation by requiring precise control over existing visual content.
The objective of this thesis is to map the evolution of these technologies, compare their strengths and weaknesses through a unified evaluation framework, and identify future research directions in safety and interpretability.
Thesis activities will include:
- Systematic Literature Review: Mapping the timeline of image generative models from early MNIST generators to modern photorealistic systems like FLUX and Imagen.
- Taxonomy Development: Categorizing models based on conditioning methods (e.g., text, sketches, depth maps) and architectural blocks like Transformers vs. U-Nets.
- Comparative Analysis: Evaluating representative models using both mathematical benchmarks (FID, PSNR, SSIM) and human perception criteria such as contextual coherence and aesthetic appeal.
- Ethical and Practical Assessment: Analyzing the “comfort gap” in public trust regarding AI-generated media and the risks of misinformation, data privacy, and inherent training bias.
- Interactive Visualization: Creating a taxonomy-driven dashboard or t-SNE visualization to illustrate how different models cluster in terms of their latent space representations.
This research aims to provide a clear roadmap for future researchers and practitioners, addressing the urgent need for standardized evaluation frameworks in the rapidly evolving landscape of generative modeling.
At the end of the thesis, a journal article publication is planned.