This project implements the deep learning architectures from You et al. 2017 and applies them to developing countries with significant agricultural productivity (Argentina, Brazil, India).
We also examine the efficacy of transfer learning of yield forecasting insights between adjoining countries; some results were published in the proceedings of COMPASS 2018. Our paper can be viewed here.
Contributers: Anna X Wang, Caelin Tran, Nikhil Desai, Professor David Lobell, Professor Stefano Ermon
- A Google Earth Engine account (for imagery retrieval)
- A Google Cloud storage account (for image data storage and access)
- A Google Cloud compute instance with python2 and GDAL
For any of these scripts, python <script>.py -h
will provide a CLI usage string with explanations of each parameter.
Steps marked with (#
) should be done if the country of interest is not the US, India, Brazil, Argentina, or Ethiopia. See commit 9f7f43 in this repository for an example of adding a new country (Ethiopia).
- (
#
) Create a Google Earth Engine table from a shapefile of your new country's level2 boundaries. - (
#
) Add your country topull_modis.py
configuration - you will need the identifier of the shapefile table in GEE, and also need to add instructions on how to extract relevant metadata (e.g. a human-readable name) from a feature in the shapefile. More detail are in the comments inpull_modis.py
. - Run
pull_modis.py
with country and imagery type to download imagery to a Google Cloud bucket - Put satellite imagery into "sat" folder, temperature images into "temp" folder, cover images into "cover" folder
- Run
histograms.py
with the sat,temp,cover folders specified arguments - outputs to a "histograms" folder - (
#
) Save a CSV containing yields for relevant set of regions, harvest years, and crop types. (TODO: more explanation of the yields CSV format) - Run
make_datasets.py
with the "histograms" folder and yields CSV, along with relevant parameters for use in ML (train/test split, years to ignore or to use, etc) - creates a "dataset" folder containing numpy arrays which will be used by training/testing architecture
- Run
train_NN.py
with a dataset name and neural net architecture type (CNN or LSTM). Generates a "log" folder containing the model weights, model predictions, and logs tracking model error. - Look inside the log folder for model results.
- Run
train_NN.py
on a dataset folder "X" as above. Generates a "log" folder. - Run
test_NN.py
and pass in as arguments the "log" folder from training, along with a new dataset folder "Y" on which to fine-tune the model. - Result is a new folder "log2" containing the new model weights, predictions, and error logs for performance on dataset "Y".