/ClinicalNTS

Primary LanguageJavaMIT LicenseMIT

In these files you will find all of the code needed to replicate the experiments undertaken as part of the paper: "Neural Text Simplification of Clinical Letters with a Domain Specific Phrase Table". If you find these resources helpful, please consider citing us as follows:

@inproceedings{Shardlow19ACL, title = "Neural Text Simplification of Clinical Letters with a Domain Specific Phrase Table", author = "Shardlow, Matthew and Nawaz, Raheel", booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = July, year = "2019", address = "Florence, Italy", publisher = "Association for Computational Linguistics", }

In addition to the resources supplied here, you will need to access the following resources:

Once you have these resources licenced, downloaded and installed, you can proceed to replicate our experiments by following the steps below:

  • run the code in the data collection folder to generate the documents that we selected
  • run the code in the PhraseTable/CreatePhraseTable folder to create the phrase table from SNOMED-CT
  • run the NTS software using the following command to simplify documents using our phrase table:

th translate.lua -replace_unk -phrase_table <PATH_TO_PHRASE_TABLE> -beam_size 5 -gpuid 0 -n_best 4 -model <PATH_TO_MODEL> -src <ORIGINAL_DOCUMENT> -output <SIMPLIFIED_DOCUMENT> -log_file <LOG_FILE>

You can also replicate the phrase table baseline by running the code in the folder: PhraseTable/ApplyPhraseTable

A full listing of files, with descriptions is below:

  • Supplementary Material/
    • ReadMe.txt - this readme
    • CrowdSourcing/
      • FigureEightTask.txt - The task description as it appeared to annotators
      • results.csv.zip - The full results of our crowdsourcing task
    • DataCollection/
      • MIMIC/
        • GetDataFromMIMIC.java - processes the original MIMIC files to get the sub-records
        • FilterFiles.java - selects the 150 sub-records with the most tokens
      • i2b2/
        • GetDataFromi2b2.java - extracts and filters records from i2b2
        • Record.java - internal data class
    • PhraseTable/
      • ApplyPhraseTable/
        • phraseTable/
          • ApplyPhraseTable.java - Applies the phrase table baseline to files
          • PhraseTable.java - internal data class
          • TestPhraseTable.java - Test class
      • CreatePhraseTable/
        • phraseTable/
          • GetTerms.java - creates the phrase table
          • Term.java - internal data class
          • TermGroup.java - internal data class
          • TermPair.java - internal data class