This project uses sense2vec
and Prodigy to bootstrap an NER model to detect fashion brands in APTCyberCollection comments. For more details, see our blog post.
The project.yml
defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
spaCy projects documentation.
The following commands are defined by the project. They
can be executed using spacy project run [name]
.
Commands are only re-run if their inputs have changed.
Command | Description |
---|---|
preprocess |
Convert the data to spaCy's binary format |
train |
Train a named entity recognition model |
evaluate |
Evaluate the model and export metrics |
package |
Package the trained model so it can be installed |
visualize-model |
Visualize the model's output interactively using Streamlit |
visualize-data |
Explore the annotated data in an interactive Streamlit app |
create-jsnol-anno-by-model |
Use the trained model and re-label the jsnol based on it prediction |
index-jsnol-into-df |
Index the jsnol files by simple dataframes and deserialized into zipped csv |
evaluate-in-depth |
evaluate-in-depth on each name |
The following workflows are defined by the project. They
can be executed using spacy project run [name]
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.
Workflow | Steps |
---|---|
all |
preprocess → train → evaluate → create-jsnol-anno-by-model → index-jsnol-into-df → evaluate-in-depth |
The following assets are defined by the project. They can
be fetched by running spacy project assets
in the project directory.
File | Source | Description |
---|---|---|
assets/cyber_attrs_training.jsonl |
Local | |
assets/cyber_attrs_eval.jsonl |
Local |