Classifying Multimodal Post Content through Multimodal Large Language Models
This thesis involves specialize MLLMs to Multimodal Post Content Classification.
Requirements
- M.Sc. in Machine Learning, Data Science, Computer Science, Mathematics, Telecommunications, or similar
- Knowledge of Python, with a focus on deep learning frameworks, particularly PyTorch
- Software development skills
- Basic concepts of image processing, natural language processing
- Basic concepts of data science, concerning data analysis, processing and machine learning
- Basic concepts of linear algebra and statistics
Description
Multimodal sentiment analysis, sarcasm detection, and fake news detection using combined image and text inputs have gained increasing attention in recent years. These tasks share semantic and affective characteristics and face similar challenges in multimodal fusion when interpreting complex human expressions. Integrating these related classification tasks into a unified framework can simplify post content analysis and improve models that jointly capture semantic and sentiment cues.
Recent Multimodal Large Language Models (MLLMs) extend traditional Large Language Models (LLMs) by enabling generative and reasoning over multimodal inputs. Consequently, MLLMs offer a promising solution for modeling image–text content in social media posts for sentiment analysis, sarcasm detection, and fake news identification.
The objective of this thesis is to develop and train a specialized MLLM for multimodal post content regulation, employing task-specific techniques to improve performance across these classification tasks.
The main activities of the thesis include:
- Reviewing the literature on Multimodal Large Language Models and multimodal post content analysis, including sentiment analysis, sarcasm detection, and fake news detection.
- Collecting key benchmark datasets and identifying state-of-the-art methods and results.
- Designing and conducting experiments to evaluate baseline approaches and improved MLLM-based solutions.
- Analyzing, visualizing, and summarizing experimental findings.