/aproof-icf-classifier

Classifier that can read medical reports and assign a functional level classification following the WHO ICF classification scheme.

Primary LanguagePythonMIT LicenseMIT

a-proof-icf-classifier

Contents

  1. Description
  2. Input File
  3. Output File
  4. Machine Learning Pipeline
  5. How to use?

Description

This repository contains a machine learning pipeline that reads a clinical note in Dutch and assigns the functioning level of the patient based on the textual description.

We focus on 9 WHO-ICF domains, which were chosen due to their relevance to recovery from COVID-19:

ICF code Domain name in repo
b1300 Energy level ENR
b140 Attention functions ATT
b152 Emotional functions STM
b440 Respiration functions ADM
b455 Exercise tolerance functions INS
b530 Weight maintenance functions MBW
d450 Walking FAC
d550 Eating ETN
d840-d859 Work and employment BER

Functioning Levels

  • FAC and INS have a scale of 0-5, where 5 means there is no functioning problem.
  • The rest of the domains have a scale of 0-4, where 4 means there is no functioning problem.
  • For more information about the levels, refer to the annotation guidelines.
  • NOTE: the values generated by the machine learning pipeline might sometimes be outside of the scale (e.g. 4.2 for ENR); this is normal in a regression model.

Input file

The input is a csv file with at least one column containing the text (one clinical note per row).

The csv must follow the following specifications:

  • sep = ;
  • quotechar = "
  • encoding = utf-8
  • the first row is the header (column names)

See example in example/input.csv.

Output file

The output file is saved in the same location as the input; it has 'output' added to the original file name.

The output file contains the same columns as the input + 9 new columns with the functioning levels per domain.

The functioning levels are generated per row. If a cell is empty, it means that this domain is not discussed in this note (according to the algorithm).

See example in example/input_output.csv.

Machine Learning Pipeline

The pipeline includes a multi-label classification model that detects the domains mentioned in a sentence, and 9 regression models that assign a level to sentences in which a specific domain was detected. All models were created by fine-tuning a pre-trained Dutch medical language model.

The pipeline includes the following steps:

ml_pipe drawio

How to use?

  1. Install Docker: see here for Windows and here for macOS.
  2. Pull the docker image from DockerHub by typing in your command line:
$ docker pull piekvossen/a-proof-icf-classifier
  1. Run the pipeline with the docker run command. You need to pass the following arguments:
  • --in_csv: path to the input csv file
  • --text_col: name of the text column in the csv

For example -

$ docker run piekvossen/a-proof-icf-classifier --in_csv .example/input.csv --text_col text

Running the docker for the first time, will download the models from huggingface:

https://huggingface.co/CLTL

In total, 10 tranformer models will be downloaded, each between 500MB and 1GB. This will take a while. After downloading, the cached models will be used.