/c4g-rpt

Primary LanguagePython

Specifications

The web application was written using a Ruby on Rails platform, as a simple demonstration of the expected functionality. The user steps inputs the details of their farm, the crop they intend to (or would like to) grow, and the state of their finances; and an estimate of their expected revenue is returned.

This revenue estimate is generated by calling a script (test2.rb), from which the returned result is displayed. The script makes use of the ‘Scoruby’ gem, which implements a Random Forest to generate a model based on the provided model (rpt_rf.pmml) and input parameters (i.e. the user’s farm, crop, and finances).

(To clone from GitHub, first install GitLFS (Large File System)).

Getting started

In order to run the web application locally, one must:

  1. Install latest version of Ruby and Rails
  2. Run 'bundle install' in the command line from containing folder
  3. Run 'rails server'
  4. The web app should be viewable at http://localhost:3000

Possible improvements

  • Add validations to the models - currently the input parameters for Farms/Crops/Fiannces are not validated for being the correct datatype, within an acceptable range of values, etc.

  • Currently, the parameters for the prediction (pages#show) are not filtered - a proper whitelist of acceptable parameters should be added.

  • The 'Generate prediction' link on the Finance#show page could be converted to AJAX/JavaScript function, such that the result appears on same page.

  • Certain commodity indicies appear to be missing in 'msp.csv' (Minimum Support Price for a commodity for a given year) file - If that can be corrected, 'test2.rb' file & pages_controller should be modified to use actual values for the MSP in generating the profit prediction.

  • The script should be rewritten as a Service.

  • A comprehensive test suite should be written.

Alternative approaches

Working within the Ruby on Rails framework:
  • The web app has a SQLite database backend, as is standard in Ruby on Rails. If this application were to need to store millions of records, migrating to a PostgreSQL or MySQL backend should be considered.

  • Currently, the machine learning model is trained using a RandomForest generated in R. This model is stored in the 'rpt_rf.pmml' file.

To generate a new .pmml file, follow this tutorial: https://github.com/asafschers/scoruby/wiki/Random-Forest & https://medium.com/@aschers/deploy-machine-learning-models-from-r-research-to-ruby-go-production-with-pmml-b41e79445d3d

If you would instead like to implement a different model, ML frameworks for Ruby can be found here: https://github.com/josephmisiti/awesome-machine-learning

  • One can also look into RubyPython or Pycall as an alternative; by which one could integrate Python libraries (such as Scikit-learn) into the Ruby/Rails framework.
Working with other frameworks:
  • The 'classify_script.rb' makes use of Java SE 6 Legacy, Jruby, Weka, & the ‘Weka-jruby’ gem. This script was originally intended to be the one called when generating the prediction. However, while the script runs fine on its own, it cannot be called in pages_controller (due to issues with the Java/Jruby/Rubygems bindings).

  • This model is built using Weka's RandomForest model, which appears to be more accurate than the Scoruby implementation. The script outputs to 'myfile.txt'. To update data used in this model, use Weka to convert an updated csv into an .arff file to be pulled in by the script.

You can run it separately using '$ java -jar jruby.jar classify_script.rb fin.arff' -- this would require Java SE 6 Legacy to be installed (as well as possibly Jruby & Weka; although the .jar files for those are included, so a separate installation should not be necessary).

  • Alternatively, given the wealth of machine learning libraries for Python, a similar web app could be re-written in Django; such that the whole ‘ecosystem’ (web app & classifier) is written in Python.