Digital Africa Plantation Counting Challenge

The objective of this challenge is to create a semi-supervised machine learning algorithm to count the number of palm oil trees in an image as part of the Zindi Digital Africa Plantation Counting Challenge.

Palm oil is an edible vegetable oil derived from the mesocarp (reddish pulp) of the fruit of the oil palms. The oil is used in food manufacturing, beauty products, and as biofuel.

This will aid farmers to determine the number of trees on their plot and estimated crop yield. The semi supervised nature of this solution will allow this solution to be applied to other plantations such as banana palms.

Data:

This data was collected using drones over 4 farms in Côte d'Ivoire in July and September 2022.

There are 2002 images in train. There are 858 images in test.

The objective of this challenge is to create a semi-supervised machine learning algorithm to count the number of palm oil trees in an image.

Some images might contain 0 palm oil trees as it is the edge of a field.

Dependencies:

Python 3.7
Pytorch 1.13
torchvision
pandas
opencv-python
scikit-learn
efficientnet_pytorch

Installation and Running Project

To run this project, you need to have Python 3.7 and PyTorch installed on your system. This can be achieved using the Pytorch Docker container.

To run on CPU you can install via the requirements.

pip install -r requirements-cpu.txt

For running on a remote instance with a GPU several scripts are provided for installing and transferring the project code in the scripts directory.

For initialising project on a remote machine, transferring code, downloading project data, and installing requirements use remote-init.sh.

bash remote-init.sh <hostname> <hostip> <auth_token>

See zindi download data for information on obtaining an auth_token.

For running the project the remote-run.sh script can be used, this allow multiple experiments to be tried sequentially by specifying multiple parameter files:

bash remote-run.sh

For pushing local changes and fetching project runs the remote-sync.sh bash script can be used:

bash remote-sync.sh  <hostname> <hostip>

Evaluation:

The error metric for this competition is the Root Mean Squared Error.

For every row in the dataset, submission files should contain 2 columns: Image_ID and Count.

Image_ID         Count
GL5_15360_8192    5
GL5_15360_9216    12

Results:

The runs.csv collects information about training runs:

run - Name of run constructed using run date, time, and model>
loss - The validation loss achieved on the run.
model_name - The pretrained model used.
learning_rate - The learning rate as a float.
batch_size - The number of images in a batch.
image_size - The size of image used in neural network.
image_rescaler - Whether images were rescaled as a boolean.
blur_kernel - The data augmentation blur kernel used.
blur_sigma - The data augmentation blur sigma value.
mem_usage - The maximum memory usage from training the network.
elapsed_time - The total training time in seconds.