BioSpaCy is a spacy pipeline for processing biology texts. Currently, the pipeline uses rulers and heuristics to identify:
- DOMAIN
- KINGDOM
- PHYLUM
- PHYLUM-ANIMALIA
- PHYLUM-BACTERIA
- PHYLUM-FUNGI
- PHYLUM-PLANTS
- PHYLUM-PROTISTA
- CLASS (NOT INCLUDED YET)
- FAMILY (Plants only)
- SUBFAMILY (Plants only)
- ORDER (Plants only)
- GENUS (Plants only)
- SPECIES (Plants only)
- BINOMINA (Plants only)
Installation
pip install en_biospacy
Usage
import spacy
from spacy import displacy
text = """
Nephrolepis exaltata, known as the sword fern[1] or Boston fern,
is a species of fern in the family Lomariopsidaceae
(sometimes treated in the families Davalliaceae or Oleandraceae,
or in its own family, Nephrolepidaceae).
"""
nlp = spacy.load("en_biospacy")
doc = nlp(text)
for span in doc.spans["ruler"]:
print(span.text, span.label_)
Expected Output
Nephrolepis GENUS
exaltata SPECIES
Lomariopsidaceae FAMILY
Davalliaceae FAMILY
Oleandraceae FAMILY
Nephrolepis exaltata BINOMINA
Data for domains, kingdoms, phyla came from Wikipedia. Data for plant family, subfamily, order, genus, and species came from The World of Flora Online
Citation to Data
"WFO (2022): World Flora Online. Version 2022.07. Published on the Internet; http://www.worldfloraonline.org. Accessed on: 1 January 2023".