Exploring spacy's project mechanism using data from Kaggle's NBME - Score Clinical Patient Notes which is treated as a named entity recognition (NER) problem here. The purpose of this repository is not to achieve NER performance but rather play with defining different spacy workflows.
The project.yml
defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
spaCy projects documentation.
The following commands are defined by the project. They
can be executed using spacy project run [name]
.
Commands are only re-run if their inputs have changed.
Command | Description |
---|---|
preprocess |
Convert the data to spaCy's binary format |
debug |
Debug the data |
train |
Train a custom NER model |
evaluate |
Evaluate the custom model and export metrics |
package |
Package the trained model so it can be installed |
install |
Install model |
serve |
Serve the models via a FastAPI REST API |
The following workflows are defined by the project. They
can be executed using spacy project run [name]
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.
Workflow | Steps |
---|---|
all |
preprocess → debug → train → evaluate |
deploy |
package → install → serve |
The following assets are defined by the project. They can
be fetched by running spacy project assets
in the project directory.
File | Source | Description |
---|---|---|
input/nbme-score-clinical-patient-notes/train_split.json.gz |
Local |
https://www.kaggle.com/c/nbme-score-clinical-patient-notes
pyenv virtualenv 3.7.12 20220211-nbme-score-clinical-patient-notes
pyenv activate 20220211-nbme-score-clinical-patient-notes
poetry install --dev
python -m ipykernel install --user --name 20220211-nbme-score-clinical-patient-notes
[1] !pip install colabcode
[2] from colabcode import ColabCode
[3] ColabCode(port=10000, password=PASSWORD, authtoken=AUTHTOKEN)