Improving the Retrieval Module in RAG Systems through Reranking Distillation
This thesis investigates knowledge distillation in the retrieval task, enhancing retrieval efficiency by transferring knowledge from rerankers to retrievers.
Requirements
- M.Sc. in Machine Learning, Data Science, Computer Science, Mathematics, Telecommunications, or similar
- Knowledge of Python, with a focus on deep learning frameworks, particularly PyTorch
- Software development skills
- Basic concepts of image processing, natural language processing
- Basic concepts of data science, concerning data analysis, processing and machine learning
- Basic concepts of linear algebra and statistics
Description
Information retrieval (IR) plays a crucial role in powering search engines, recommendation systems, and question-answering platforms by enabling efficient access to relevant information. Traditional IR models primarily focus on retrieving content based on relevance to a given query, but they often require further refinement to improve the ranking of search results. This is where rerankers come into play, as they enhance the ordering of retrieved documents or items to better match user intent.
Research in NLP has shown that knowledge from powerful rerankers can be transferred to retrievers, enhancing their performance while preserving efficiency. This distillation approach is promising, paving the way for new techniques that further optimize retrieval models. Future work may refine distillation strategies and improve reranker-retriever integration for greater accuracy and efficiency.
Objective: investigating using knowledge distillation in the retrieval task to transfer knowledge from cross-encoders (strong rerankers) to bi-encoders (efficient retrievers), improving both accuracy and efficiency.
The main activities of the thesis include:
- Explore distillation techniques from rerankers to retrievers in NLP.
- Develop an initial experimental setup based on NLP methods.
- Compare existing methodologies and, if necessary, develop a new approach to transfer knowledge from reranker to retriever.
- (Optional) Test approaches in a multimodal setting for the Cross-modal retrieval task.