Adaptive Granularity Retrieval for Retrieval-Augmented Generation

This thesis explores adaptive retrieval for Retrieval-Augmented Generation, developing a system that dynamically adjusts the granularity of retrieved context (document, section, or passage) based on query intent. The goal is to improve both precision for fine-grained questions and coherence for broad, open-ended queries.

Requirements

  • M.Sc. in Machine Learning, Data Science, Computer Science, Mathematics, Telecommunications, or similar
  • Knowledge of Python
  • Software development skills
  • Knowledge of signals
  • Basic knowleddge of natural language modelling and semantic embedding
  • Basic knowleddge of retrieval

Description

Retrieval-Augmented Generation (RAG) is arguably one of the technology with most traction at the moment. Most RAG systems, however, struggle to achieve their full potential because they rely on a fixed retrieval granularity, typically retrieving passages or chunks of uniform size. This approach often leads to mismatches between the information need and the retrieved evidence: broad questions like “What are these documents about?” demand high-level summaries, while specific factoid queries like “Where was John born?” require fine-grained snippets. The challenge is to design a retrieval system that can dynamically adapt to the level of detail a query requires. This thesis asks: how can we model and predict query intent to select the appropriate retrieval granularity, and how does such an adaptive system impact answer accuracy, coherence, and efficiency compared to fixed-granularity RAG methods?

Contacts