HARVESTIFY - Crop Disease detection and Fertilizer suggestion using CNN 🌿

Farmers in many parts of India are having trouble growing crops because of the climate and soil. There could be no genuine assistant available to assist them with encouraging the right sorts of plants using current advancement. Due to illiteracy, farmers may not be able to benefit from advances in agricultural science and continue using human methods. This makes it difficult to achieve the desired yield. For instance, improper fertilization or unintentional rainfall patterns may be the cause of crop failure. In such circumstances, picking crops that are suitable for the soil's current conditions and the anticipated rainfall during planting would be the best course of action. Thus, we are presenting an information mining-based "Soil-Based Profile Profiling Framework." In light of the rancher's region's precipitation and soil input boundaries (NPK and pH), we provide a list of possible yields. Additionally, it suggests fertilizer that can be utilized to increase crop yields and enhance the soil's quality. The growing problem of crop failure is the focus of this desktop application.

DISCLAIMER ⚠️

This is a POC(Proof of concept) kind-of project. The data used here comes up with no guarantee from the creator. So, don't use it for making farming decisions. If you do so, the creator is not responsible for anything. However, this project presents the idea that how we can use ML/DL into precision farming if developed at large scale and with authentic and verified data.

MOTIVATION 💪

Farmers often face numerous challenges, such as unpredictable weather conditions, pests and diseases, and market fluctuations, which can make it difficult for them to make a decent living from their crops. However, despite these challenges, farmers continue to work hard to provide for their families and communities. The Harvestify project is a testament to the power of technology to improve people's lives. By creating a tool that is accessible and easy to use, the project team is empowering farmers to take control of their farming activities and make informed decisions that can lead to better outcomes. The dedication and hard work put into the Harvestify project is a reminder that even small actions can make a big difference in the world.

OBJECTIVES

To enable farmers to input and monitor data related to their crops, such as planting and harvesting dates, crop types, and yields. To provide a user-friendly web application for farmers to track their crops and farming activities. To help farmers make informed decisions about their farming activities by providing data analysis and visualization tools. To improve the efficiency and productivity of farming activities and identify areas for improvement. To encourage the adoption of digital technologies in agriculture by demonstrating the benefits of using such tools.

DATA SOURCE 📊

Crop recommendation dataset (custom built dataset)
Fertilizer suggestion dataset (custom built dataset)
Disease detection dataset

METHODOLOGY

The sections that follow go into great detail about our application's implementation, dataset, and training data, as well as the machine learning that was used in our experiments. To begin, we demonstrate the user interface design of our application by employing flowcharts and block diagrams. After that, we move on to our AI tests, where we show off the various models we use and other trial nuances. The nested subsections that separate the two sections are recommendations for fertilizer, plant disease detection, and crop recommendations. The machine learning section provides an explanation of how we use LIME for interpretation, while the application section provides a description of the news feed implementation.. A. The Application

The Recommendation for Fertilizer: Along with the crop name, the user must enter the nitrogen, phosphorus, and potassium values. To access the flask API, a POST request is made. The hosting location for the fertilizer recommendation classifier is here. The front-end receives an HTTP response and provides the user with a fertilizer recommendation.
Disease Discovery: In disease detection, the user must either directly upload an image or click on it. The model processes the image after it is sent to the back-end. An HTTP response is sent to the front-end after the image has been processed. The plant's cures for the disease are given to the user. Fig. 2 depicts the same's flow diagram.
The Recommended Crop: A post request to the flask API is made after the values of nitrogen, phosphorus, and potassium are entered. After the model runs a HTTP reaction is shipped off the front-end which tells the best yield a rancher can fill in the dirt to get the best out of the land. Fig. The same's flow diagram can be seen in Figure 3.
Disease Website: The disease portal provides a comprehensive overview of various plant diseases and the various products that can be purchased to treat them.
Evaluation of Interpretability: The user's plant leaf image is sent to a deployed API, where the LIME computation is performed on a droplet server hosted on Digital Ocean. The resulting image is sent as a URI, and it is displayed on the front end. B. A Crop Machine Learning Proposal: The Set's Description: This dataset, taken from Kaggle 1, is somewhat basic and contains few but important elements, not at all like the confused highlights influencing harvest yield. It has seven specific highlights, specifically N: The ratio of the nitrogen content of the soil, P: K: Temperature, phosphorus and potassium content of the soil in relation to one another: Celsius temperature and the amount of humidity: percent of relative mugginess, ph.: precipitation, the dirt's ph. value: mm of precipitation. The task is to expect the sort of reap using these 7 components. There are 2200 examples and 22 class names altogether, including the accompanying: Rice, coffee, muskmelon, and other foods The fact that each class has 100 samples demonstrates that the dataset is perfectly balanced and does not require any special imbalance handling techniques.Approach: Cross-validation on these five folds is carried out on the dataset, which is divided into folds. Six models are used to evaluate performance: • Decision Tree with a maximum depth of 5 and entropy as the criterion • The naive Bayes • A SVM with an input scaling of 0 to 1, a degree 3 polynomial kernel, and the L2 regularization parameter C=3 XGBoost can be found at Except for XGBoost, which is derived from the xgboost library, all of the models are implemented with the sklearn library. For the purposes of our training, the parameters that aren't mentioned are set to default. For the purpose of the application, we choose the model with the highest performance and use it for inference. Disease Recognition: Dataset Depiction: We look at the PlantVillage dataset for leaf disease detection. In particular, we make use of an enhanced version of the PlantVillage dataset that is available on Kaggle2. There are 87,000 RGB examples of both healthy and diseased crops in the dataset, each with a spread of 38 class labels. There are 14 crops included, and there are a total of 26 distinct diseases. Overall, each class contains 1850 picture tests with a standard deviation Of 104. The ratio of training to validation in the dataset has been divided into 80:20. With only a picture of the plant leaf, we try to predict the crop-disease pair. We resize the images to 224 x 224 pixels and reduce their size by 255 times. Figure depicts a sample batch from the PlantVillage dataset. We perform both the model streamlining and expectations on these downscaled pictures. Approach: We employ three ImageNet-pretrained models for our experiments: VGG-16, ResNet-50, and EfficientNetB0. The ImageNet dataset performance of these models is influenced by their sizes, parameter counts, and performance. It has been shown that these pre-prepared models perform better compared to a model prepared without any preparation on the PlantVillage dataset. With an initial learning rate of 2e-5, beta values of (0.9, 0.999), and an epsilon of 1e-08, we employ the Adam optimization method with categorical cross-entropy loss during training. A low learning rate is utilized to forestall the difference of the model and to safeguard crude picture channels recognized during pre-preparing. During training, a batch size of 32 and a number of epochs of 25 are utilized. Moreover, we likewise utilize early pausing and model checkpointing in view of the approval misfortune, which gives us During the training, we also record the model's accuracy. These models' capabilities may be enhanced When working with a dataset of images, the graphics processing unit (GPU) should be used rather than the central processing unit (CPU). This is because GPUs permit multiple parallel computations, which makes training and inference faster. We utilize the free GPUs given by Kaggle and Google Collab for our analyses. Ultimately, we utilize the LIME technique to comprehend the forecasts made by our best model. For this, we utilize 1000 examples in the LIME strategy from the lime bundle, and we really take a look at the emphatically weighted sections towards the anticipated class. After that, we plot the top ten crucial sections on the image and present them to the application. However, the application only uses 249 samples because of computation limitations. LIME provides better explanations with larger numbers of samples.
A Recommendation for Fertilizer: Dataset Depiction: We use a custom dataset3 with five features for fertilizer recommendation: crop, nitrogen, phosphorus, potassium, pH, and soil moisture. There are 22 crops, including coffee beans, rice, and maize. with their ideal values for N, P, and K. The data set shows how much N, P, and K should be in the soil for the crop to grow the most effectively. The farmer should use a fertilizer based on the N, P, or K value that is lacking. Approach: To select the most effective fertilizer for a plant, we employed rule-based classification, a classification scheme that makes use of IF-THEN rules for class prediction. A fertilizer may be required depending on how far a plant is from its ideal N, P, or K value. For our motivations, we have 6 sorts of manure suggestions right now, in view of whether the N/P/K qualities are high or low.

##V. CONCLUSION AND FUTURE WORK The "Farmer's Assistant" is a user-friendly web application system based on machine learning and web scraping that we propose in this paper. We can give a few elements our framework, including crop illness location utilizing an Effective Net model on leaf pictures, compost suggestion utilizing a standard based grouping framework, and harvest proposal utilizing the Irregular Timberland calculation. By using our user interface, the user can quickly get their results by filling out forms. Additionally, we utilize the LIME interpretability strategy to make sense of our forecasts on the sickness identification picture. We might be able to improve datasets and models that make use of this information and understand why our model states what it does. Even though our application runs very smoothly, there are many ways we can make it better. First, we can suggest fertilizer and crop recommendations that can be found on popular online shopping sites. Users may also be able to make direct purchases of fertilizers and crops through our app. Another way fertilizer recommendation can be improved is if we can find data on the various brands and products that are available based on the N, P, and K values. We currently only provide six types of proposals; however, in the future, we should be able to use sophisticated AI frameworks to provide better suggestions. Following that, we come to the realization that the dataset we used to classify diseases is incomplete. This suggests that only images belonging to classes that our model already understands perform well. It will not be able to correctly classify data that is outside the domain. There are two options for dealing with this problem in the future. Finding extra datasets with different harvests or potentially infections at comparable scales or producing and scaling those datasets utilizing generative demonstrating to add to our preparation set is one choice. As a result, our model's ability to generalize will improve. The next option is for customers to be able to add their own images by entering our web application and commenting on the actual images. Also, it has been exhibited that LIME clarifications all by themselves are not generally reliable because of the way that they just give nearby data about a model and don't resolve the worldwide issues the model spotlights on. Grad CAM and Integrated Gradients are two additional methods that we can use as a result. or additional training techniques like LIME's sparse-linear layers to better explain our model predictions Last but not least, we plan to offer diseased dataset segmentation on a more fine level. Due to a lack of such data, this is currently impossible. However, we are able to incorporate a segmentation annotation tool into our application to allow users to help us fill the void. Additionally, with the assistance of a few unsupervised algorithms, we are able to identify the image's diseased regions. In the upcoming work, we intend to include these highlights and fill in the gaps.