Health News Classification

Classification of health news into the ICD9 taxonomy

Introduction

The TrustAlert: Empowering Public Health with Real-Time Insights and Future Preparedness project, explores how to detect and monitor potential disease outbreaks — just by analyzing news articles from the GDELT database. The pipeline of the project begins with the IPTC Annotation System, which classifies news articles into societal domains such as health, business, and politics using an MPNET encoder. Articles classified under the health domain are then processed by the ICD9 Annotation System. This system matches the content of the health-related articles to the most relevant ICD9 medical codes, providing a bridge between news and clinical terminology.

Key Features

  • GDELT news gathering: Collects news articles from the GDELT database, a global dataset of events and news coverage.
  • Societal domain classification: Categorizes news articles into societal domains such as health, business, and politics using an MPNET encoder.
  • ICD9 news classification: Maps health-related news articles to the most relevant ICD9 medical codes, enabling a connection between news content and clinical terminology.

TrustAlert global

Technologies Used

  • Natural Language Processing (NLP): Utilizes advanced models like MPNET for encoding and classifying news articles into societal domains and the ICD9 taxonomy.

Use Cases

The rapid emergence and spread of infectious diseases presents significant challenges to global health services, highlighting the need for early detection systems for timely effective response and containment.

Live Demo

Input

Users can either write or paste a news article text directly, or enter the link to an article to extract the text from the web page and edit it as needed.

How it works

The input text will be embedded using the MPNET model. The embeddings of the text will then be compared to the IPTC taxonomy embeddings. If the most similar embedding corresponds to the health IPTC tag, the input text will be further compared to the embeddings of the ICD9 taxonomy to identify relevant medical codes.

TrustAlert pipeline

Output

The top 10 most similar ICD9 codes to the input text will be output, providing a ranked list of relevant medical classifications.

Try it out

Try out the solution in real time on Hugging Face Spaces:

👉 Launch Demo

Benefits

  • Increased efficiency: Automates the classification of health news, reducing manual effort.
  • Time savings: Quickly processes large volumes of news articles in real time.
  • Cost optimization: Minimizes the resources required for manual classification and analysis.
  • Improved accuracy: Leverages advanced NLP models like MPNET to ensure precise classification.
  • Enhanced decision-making: Provides actionable insights for public health preparedness and response.

Integration

This process will be integrated into the TrustAlert project, a larger initiative that leverages not only news articles but also hospital information and research outputs to create a comprehensive global monitoring system. By combining these data sources, TrustAlert aims to provide real-time insights and enhance public health preparedness. More information about the TrustAlert project can be found at www.trustalert.it.