Unsupervised topic modelling of social media content

Development of an unsupervised topic modelling technique to cluster and classify social media posts.


  • M.Sc. in Machine Learning, Computer Science, Mathematics, Physics, Engineering, or similar
  • Basic knowledge of Natural Language Processing (NLP) and word embeddings
  • Familiarity with Python


Social media data is highly unstructured, making it essential to impose some structure. One way to do this is by grouping documents with topic modelling. The topic modelling technique applied to social media can have various applications such as extracting public opinion and trends, conducting market research, real-time crisis monitoring, and content filtering. However, due to its unsupervised nature, topic modelling is a challenging task. Moreover generating meaningful and human-friendly topic representations is crucial for the application of these models. This thesis can be divided into four main parts: (1) a literature review of state-of-the-art topic modelling methods, (2) task definition and dataset preparation, (3) development of the topic modelling algorithm, and (4) assignment of a human friendly labels for the identified topics.
