Classifiers for identifying phenology traits on images of herbarium sheets.
There is a lot of effort to digitize and annotate photographs of plant images and herbarium specimens. However, this effort is, up until now, mostly manual, error-prone, and labor-intensive resulting in only a fraction of these images being fully annotated. This project uses neural networks to automate the annotation of some biologically significant traits related to phenology: flowering, fruiting, leaf-out, etc.
The basic steps are:
- Obtain a database of plant images with corresponding annotations.
- I'm using data from the iDigBio project to get the URL of images to download.
- Clean the database to only contain records with a single Angiosperm herbarium sheet, that also contain phenology annotations.
- We can either use the records from above that are pre-identified or have experts annotate the images. The later is preferable.
- I'm using data from the iDigBio project to get the URL of images to download.
- Train a neural network(s) to recognize the traits. We are using the pytorch library to build the neural networks. I am also, using models and scripts from HuggingFace.
- Because it can be difficult to get a significant amount of quality annotations I'm using masked autoencoders for a pretraining step.
- Use the encoding part of the masked autoencoder as a backbone for the actual phenology trait classifier.
- Use the trained neural networks to annotate images en masse.
Coming soon!
- More thrills
- More spills
- More explanations of what I'm actually doing here.
git clone https://github.com/rafelafrance/phenobase.git
cd phenobase
make install