🪐 spaCy Project: NBME - Score Clinical Patient Notes

Exploring spacy's project mechanism using data from Kaggle's NBME - Score Clinical Patient Notes which is treated as a named entity recognition (NER) problem here. The purpose of this repository is not to achieve NER performance but rather play with defining different spacy workflows.

📋 project.yml

The project.yml defines the data assets required by the project, as well as the available commands and workflows. For details, see the spaCy projects documentation.

⏯ Commands

The following commands are defined by the project. They can be executed using spacy project run [name]. Commands are only re-run if their inputs have changed.

Command Description
preprocess Convert the data to spaCy's binary format
debug Debug the data
train Train a custom NER model
evaluate Evaluate the custom model and export metrics
package Package the trained model so it can be installed
install Install model
serve Serve the models via a FastAPI REST API

⏭ Workflows

The following workflows are defined by the project. They can be executed using spacy project run [name] and will run the specified commands in order. Commands are only re-run if their inputs have changed.

Workflow Steps
all preprocessdebugtrainevaluate
deploy packageinstallserve

🗂 Assets

The following assets are defined by the project. They can be fetched by running spacy project assets in the project directory.

File Source Description
input/nbme-score-clinical-patient-notes/train_split.json.gz Local

Legacy

Kaggle challenge: NBME - Score Clinical Patient Notes

https://www.kaggle.com/c/nbme-score-clinical-patient-notes

Initi

pyenv virtualenv 3.7.12 20220211-nbme-score-clinical-patient-notes
pyenv activate 20220211-nbme-score-clinical-patient-notes
poetry install --dev
python -m ipykernel install --user --name 20220211-nbme-score-clinical-patient-notes

Running

[1]  !pip install colabcode
[2] from colabcode import ColabCode
[3] ColabCode(port=10000, password=PASSWORD, authtoken=AUTHTOKEN)