The BaMApp Base Model for Agricultural Applications repository aims to provide the resources and guidance necessary for fine-tuning the DinoV2 model on a dataset specific to the AgTech sector. This model will be trained on RGB images of plants in various conditions: field, greenhouse, and indoor, as well as data from high throughput phenotyping facilities.
All data will be hosted on DeepLake, an open-source data lake for machine learning datasets.
The AgTech sector has unique data requirements and challenges that can greatly benefit from a foundation model specifically designed and fine-tuned for its use-cases. By providing this foundation model, we aim to advance research and development in the AgTech sector, facilitate phenotyping cases, and pave the way for more advanced, domain-specific models in the future.
The DinoV2 model, developed by Facebook AI, is a computer vision model trained using self-supervised learning. This model represents a significant advancement in the field of computer vision and provides a strong foundation for further fine-tuning on domain-specific data. For more information, visit the DinoV2 Blog Post.
DeepLake is an open-source project by ActiveLoop that provides a data lake for machine learning datasets. It allows users to store, share, and collaborate on large-scale datasets in an efficient and straightforward manner.
- Clone the repo.
- Create a virtual environment and install the requirements
pip install requirements.txt
- Login to DeepLake see here for instructions.
- Upload the images to DeepLake using the upload.py script. The script takes in the following arguments:
usage: upload.py [-h] [--folder FOLDER] [--commit_message COMMIT_MESSAGE] [--json JSON]
Upload images to deeplake
options:
-h, --help show this help message and exit
--folder FOLDER Folder with images
--commit_message COMMIT_MESSAGE
Commit message
--json JSON Json file formated Metadata eg. {"Origin": "Test"}
An example call would be:
python upload.py --folder ./images --commit_message "Test" --json '{"source_dataset_name": "Test", "Description": "Test", "ref_url": "test"}'
The field 'source_dataset_name' is required and should be the name of the dataset you are uploading. The field 'Description' is optional and can be used to provide additional information about the dataset. 5. Report the commit ID to the BaMApp team, to keep track of the data.
Dataset Description | Commit ID | Origin of Data |
---|---|---|
Placeholder | Placeholder | Placeholder |
We aim to train the BaMApp Base Model for Agricultural Applications and release it in Q3/4 2023. Please stay tuned for updates.
At present, the BaMApp Base Model for Agricultural Applications only supports RGB images without any labels. In the future, we plan to include support for other modalities.