🪐 spaCy Project: Example project of creating a novel nlp component to do relation extraction from scratch.
This example project shows how to implement a spaCy component with a custom Machine Learning model, how to train it with and without a transformer, and how to apply it on an evaluation dataset.
The project.yml
defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
spaCy projects documentation.
The following commands are defined by the project. They
can be executed using spacy project run [name]
.
Commands are only re-run if their inputs have changed.
Command | Description |
---|---|
data |
Parse the gold-standard annotations from the Prodigy annotations. |
train_cpu |
Train the REL model on the CPU and evaluate on the dev corpus. |
train_gpu |
Train the REL model with a Transformer on a GPU and evaluate on the dev corpus. |
evaluate |
Apply the best model to new, unseen text, and measure accuracy at different thresholds. |
clean |
Remove intermediate files to start data preparation and training from a clean slate. |
The following workflows are defined by the project. They
can be executed using spacy project run [name]
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.
Workflow | Steps |
---|---|
all |
data → train_cpu → evaluate |
all_gpu |
data → train_gpu → evaluate |
The following assets are defined by the project. They can
be fetched by running spacy project assets
in the project directory.
File | Source | Description |
---|---|---|
assets/annotations.jsonl |
Local | Gold-standard REL annotations created with Prodigy |