PlantVillage-Project: A Jupyter Notebook repository from pedroharamoto

###Inspiration

Since the dawn of time, humans were depending on edible plants to survive, our ancestors would travel for long distances searching for food, no surprise that the first human civilizations began after the invention of agriculture, without crops will be impossible for humanity to surviv.

Modern technologies have given human society the ability to produce enough food to meet the demand of more than 7 billion people. However, food security still threated by many factores including plant diseases, see (Strange and Scott, 2005)

Plant diseases are major threats for smallholder farmers, whose depend on healthy crops to survive, and about 80% of the agricultural production in the developing world is generated by them. see (UNEP, 2013) .. Identifying a disease correctly when it first appears is a crucial step for efficient disease management, traditional approaches to identify diseases is done by visiting local plant clinics.

But recent development in smartphones and computer vision would make their advanced HD cameras very intresting tool to identefy diseases.

It is widely estimated that there will be between 5 and 6 billion smartphones on the globe by 2020. At the end of 2015, already 69% of the world's population had access to mobile broadband coverage.

Significant impacts in image recognition were felt from 2011 to 2012. Although CNNs trained by backpropagation had been around for decades, and GPU implementations of NNs for years, including CNNs, fast implementations of CNNs with max-pooling on GPUs in the style of Ciresan and colleagues were needed to progress on computer vision. In 2011, this approach achieved for the first time superhuman performance in a visual pattern recognition contest. Also in 2011, it won the ICDAR Chinese handwriting contest, and in May 2012, it won the ISBI image segmentation contest. Until 2011, CNNs did not play a major role at computer vision conferences, but in June 2012, a paper by Ciresan et al. at the leading conference CVPR showed how max-pooling CNNs on GPU can dramatically improve many vision benchmark records. In October 2012, a similar system by Krizhevsky et al. won the large-scale ImageNet competition by a significant margin over shallow machine learning methods. In November 2012, Ciresan et al.'s system also won the ICPR contest on analysis of large medical images for cancer detection, and in the following year also the MICCAI Grand Challenge on the same topic. In 2013 and 2014, the error rate on the ImageNet task using deep learning was further reduced, following a similar trend in large-scale speech recognition. The Wolfram Image Identification project publicized these improvements. Some researchers assess that the October 2012 ImageNet victory anchored the start of a "deep learning revolution" that has transformed the AI industry.

So here, using state of the art deep learning techniques, we demonstrated the feasibility of our approach by using a public dataset of 9000 images for healthy and infected Tomato leaves, to produce a model that can be used in smartphones applications to identefy 5 types of Tomato leaf diseases, with an accuracy of 99.84% on a held-out test set.

###Dataset

Figure 1: Dataset samples

We extracted our dataset from the well known Plantvillage dataset, which contains nearly 5,000 image of 14 crop species and 26 diseases. We choose to work with 9,000 images on Tomato leaves, our dataset contains samples for 5 types of Tomato diseases in addtion to healthy leaves, 6 classes in total as follow:

class (0): Bacterial Spot.
class (1): Early Blight.
class (2): Healthy.
class (3): Septorial Leaf Spot.
class (4): Leaf mold.
class (5): Yellow Leaf Curl Virus.

The images were resized into 150×150 for faster computations but without compromising the quality of the data.

###Model

Our model takes raw images as an input, so we used CNNs (Convolutional Nural Networks) to extract features, in result the model would consist of two parts:

The first part of the model (features extraction), which was the same for full-color approach and gray-scale approach, it consist of 4 Convolutional layers with Relu activation function, each followed by Max Pooling layer.
The second part after the flatten layer contains two dense layers for both approaches, but in full-color the first has 256 hidden units which makes the total number of network trainable parameters 3,601,478, in the other hand gray-scale approach has 128 hidden units in the fist dense layer and 1,994,374 as total trainable parameters, we shrinked the size of the gray-scal network to avoid overfitting, for the last layer for both has Softmax as activation and 6 outputs representing the 6 classes.

Full-Color Model Sammary:

Layer (type)	Output Shape	Param #
conv2d_1 (Conv2D)	(None, 148, 148, 32)	896
max_pooling2d_1 (MaxPooling2)	(None, 74, 74, 32)	0
conv2d_2 (Conv2D)	(None, 72, 72, 64)	18496
max_pooling2d_2 (MaxPooling2)	(None, 36, 36, 64)	0
conv2d_3 (Conv2D)	(None, 34, 34, 128)	73856
max_pooling2d_3 (MaxPooling2)	(None, 17, 17, 128)	0
conv2d_4 (Conv2D)	(None, 15, 15, 256)	295168
max_pooling2d_4 (MaxPooling2)	(None, 7, 7, 256)	0
flatten_1 (Flatten)	(None, 12544)	0
dropout_1 (Dropout)	(None, 12544)	0
dense_1 (Dense)	(None, 256)	3211520
dense_2 (Dense)	(None, 6)	1542

Gray-Scale Model Sammary:

Layer (type)	Output Shape	Param #
conv2d_1 (Conv2D)	(None, 148, 148, 32)	320
max_pooling2d_1 (MaxPooling2)	(None, 74, 74, 32)	0
conv2d_2 (Conv2D)	(None, 72, 72, 64)	18496
max_pooling2d_2 (MaxPooling2)	(None, 36, 36, 64)	0
conv2d_3 (Conv2D)	(None, 34, 34, 128)	73856
max_pooling2d_3 (MaxPooling2)	(None, 17, 17, 128)	0
conv2d_4 (Conv2D)	(None, 15, 15, 256)	295168
max_pooling2d_4 (MaxPooling2)	(None, 7, 7, 256)	0
flatten_1 (Flatten)	(None, 12544)	0
dropout_1 (Dropout)	(None, 12544)	0
dense_1 (Dense)	(None, 128)	1605760
dense_2 (Dense)	(None, 6)	774

Methods:

We experimented with two types of images to see how the model work and what exactly it learns, first we take the image as it is with 3 color channels, and then we experemented with 1 color channel images (Gray-Scale). And as expected the model learns different patterns in each approach.

Data visualisation

To see how the model works and what exactly learns we choose to visualiz intermediate activations that consists of displaying the feature maps that are output by various convolution and pooling layers in a network, given a certain input (the output of a layer is often called its activation, the output of the activation func-tion). This gives a view into how an input is decomposed into the different filters learned by the network.

As showen in Figures 3 and 4, the full-color model learned how to identefy the diseas spots, the gray-scale method in the other hand did not learn how to locate the disease, but insted learned only the shape of the leaf and some patterns in the background.

Figure 2: Input image

Figure 3: Full-Color intermediate activations

Figure 4: Gray-Scale intermediate activations

Results:

We are so proud to show that out best model (Full-Color) achieved an accuracy of 99.84% on a held-out test set, and second best model (Gray-Scale) achieved an accuracy of 95.54%, Figures 5 and 6 show how the models accuracy progress over epochs.

Figure 5: Full-Color Training and Validation Accuracy

Figure 6: Gray-Scale Training and Validation Accuracy

###Tools:

ipython 7.0.1
jupyter 1.0.0
Keras 2.2.4
matplotlib 3.0.1
numpy 1.15.1
opencv-python 3.4.3.18
python 3.6.7
tensorflow-gpu 1.11.0

pedroharamoto/PlantVillage-Project