Attention! I don't have any knowledge in mycology, this project was created only for learning purpose, and may contain errors and inaccuracies!
Download dataset here
Download models and weights here
Of course I could take any already complete dataset from kaggle.
But the whole point of this project collect data and make everything work by myself with different models and approaches, with a small amount of data. In other words, it's all about getting knowledge and experience.
- Get image with mushrooms
- Detect every mushroom in the image
- Make a classification for each mushroom separately
- Show results
Dataset: 1018 jpg images, already resized to 200x200, and one csv file with annotations Approach: Bounding boxes Models: DETR and FasterRCNN, with pretrained weights Deep learning: PyTorch
Training process curves
Feed models with testing images
Dataset: 553 jpg images, and one csv file with image_id and classes. 15 classes with 30-50 images in each Models: Custom CNN, ResNet50 (pretrained) and MobileNetV2 (pretrained) Deep learning: Tensorflow.keras
Results are not so good because of small images amount. To my mind if I add 100-200 images it will not change accuracy much, and just not worth the time (FOR THIS PARTICULAR PROJECT).
Training process curves
MobileNetV2
ResNet50
Possible upgrades to get better results:
- Increase dataset
- Use more models for blending
- Hyperparameters tuning
For object detection I decided to use DETR model, for classification blend of ResNet50 and MobileNetV2 (CustomCNN shows too bad results).
In main application I loaded my pretrained models and feed them images.
As output you'll get cropped mushroom with top 5 predictions.
Good results
Not so good results
To my mind this is not the hardest data to gather, and it is perfectly fits my project objectives.
I decided to split this task for multiple reasons:
- Get more flexibility
- Get more precise models
- Get more knowledge
Totally worth it, I wanted c-vision knowledge and experience, I earned a lot of it.
Important project for me, because it gives understanding of the whole computer vision process.
If you found any mistakes, please let me know on twitter.
All references are in the code section. Thank you!