/medical-specialty

Medical specialty prediction based on medical transcriptions using Spark NLP

Primary LanguageJupyter Notebook

GitHub Issues Forks GitHub Stars Contributors

Medical Specialty Prediction with Spark NLP

The goal of this project was to predict medical specialties (surgery, internal medicine, medical records, other) based on a corpus of 4999 medical transcriptions using Spark NLP. The corpus was scraped by Tara Boyle from a Transcribed Medical Transcription Sample Reports and Examples website and published on Kaggle. The version used in this project was compiled by Carlos Salgado for Natural Language Processing using the scraped corpus and custom-generated clinical stop words and vocabulary. This compiled version was published on GitHub and is free to use.

Note: the notebook can be opened and run in Google Colab.

The following models were tested using Spark NLP's open source and licensed healthcare version:

  • DL Classification with Universal Sentence Encoder
  • DL Classification with BERT Sentence Embeddings
  • DL Classification with BioBERT (Clnical) Sentence Embeddings
  • DL Classification with BioBERT (MedNLI) Sentence Embeddings
  • Logistic Regression with Universal Sentence Encoder
  • Logistic Regression with CountVectorizer
  • Logistic Regression with TF-IDF
  • Random Forest with Universal Sentence Encoder
  • Random Forest with CountVectorizer
  • Random Forest with TF-IDF
  • Random Forest with feature engineering using 5 clinical NER models, a clinical risk assertion model and 2 clinical entity resolvers
  • Logistic Regression with feature engineering using 5 clinical NER models, a clinical risk assertion model and 2 clinical entity resolvers

Contributing

Any contributions you make are really helpful!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingContribution)
  3. Commit your Changes (git commit -m 'Add some AmazingContribution')
  4. Push to the Branch (git push origin feature/AmazingContribution)
  5. Open a Pull Request

Reporting Issues

Does something seem off? Make sure to report it.