/compare-annotations

Qualitative analysis and Quantifying Variability of Manual Annotations

Primary LanguagePython

Qualitative analysis and Quantifying Variability of Manual Annotations

Introduction

This project is about monitor and assess the eventual manual annotation task done on the pre-annotated documents.

The project includes a script that computes inter-annotation agreement (IAA) and compares the pre-annotations with the eventual manual annotations to monitor the human annotation task.

Directory structure

  • src/: This folder contains the codes needed for comparison different annotations records (different annotators have done that) for the same file and comparison between the manually annotated files and pre-annotated files by SpaCTeS tool.

    • compare_annotations.py: Monitor and assess the eventual manual annotation task done on the pre-annotated documents.

      1. Computes inter-annotation agreement (IAA) between shared files of annotators. Outputs are IAA/CSV Directory, which include results of computing inter-annotation agreement for each document (CSV format). Each file has all manual annotations by annotators for the given document.

      2. Compares the manual annotations with the pre-annotations by SpaCTeS tool to monitor the human annotation task. It detects which pre-annotated variables have been changed, accepted, removed and which variables have been added by annotators. it calculates statistical analysis on how many variables have been added/accepted/changed/removed. also, calculates statistical analysis on how many variables have been added/accepted/changed/removed and acceptance frequency, mismatched records, details of IAA score, save new section and variables added by annotatiors and finally report the suspicious strange spans added by annotators.

        The outputs are in analysis directory.

    • compare_ctakes_annotators.py: Monitor and assess the pre-annotated task by computing inter-annotation agreement (IAA) between pre_annotation by SpaCTeS tool and manual annotation by annotators. Outputs are IAA_ANN IAA_CSV Directories, which include results of computing inter-annotation agreement for each document (ANN and CSV formats, Respectively). Each file has all manual annotations of annotators for the given document.

      The output is in statistival directory.

  • annotations/ This folder contains the pre-annotated documents that have generated by SpaCTeS tool and manually annotation by annotators.

  • analysis/ This folder contains the results of monitoring and assessing the eventual manual annotation task done on the pre-annotated documents that have generated by SpaCTeS tool.

    • IAA/CSV/: Result of comparison annotators' activities in csv files for shared files/reports between annotators. (shows all opinion of annotators for each annotations of given files.).
    • analysis_per_file/: Results of comparison between the manual annotations and the pre-annotations monitor the human annotation task. It shows which pre-annotated variables have been changed, accepted, removed and which variables have been added by annotators for each file (report)
    • statistical/: Result of comparison between the manual annotations and the pre-annotations by SpaCTeS tool to monitor the human annotation task. It shows which pre-annotated variables have been changed, accepted, removed and which variables have been added by annotators for each bunch. And it shows statistical analysis on how many variables have been added/accepted/changed/removed. Also, it shows acceptance frequency, mismatched records, details of IAA score, save new section and variables added by annotatiors and finally report the suspicious strange spans added by annotators.

Usage

For monitor and assess the eventual manual annotation task done on the pre-annotated documents, use following command:

python3 compare_annotations.py --bunch NUMBER [option]
python3 compare_ctakes_annotators.py --bunch NUMBER 

Options:

--bunch_2      Number of second bunch that we shared documents with the first bunch (--bunch)

Contact

Siamak Barzegar (siamak.barzegar@bsc.es)