Project 1 Bios 611


Mushroom Classification and Generation Dataset


Introduction

Mushrooms are a fungi that are commonly used in food and sometimes for medicinal purposes, but the wide variety of characteristics across mushroom types can make distinction of species or sub-species difficult. This becomes even more challenging due to the fact that many mushroom types have evolved to have similar colors, patterns, and shapes due to varying evolutionary forces. When looking for edible mushrooms, knowing the difference between subtypes could easily be a matter of life and death, so finding a lightweight model that can classify mushrooms through a few questions about its characteristics from a phone could be very useful for hikers and mycologists alike.

This project aims to both explore what differentiates mushroom types/subtypes and create the most lightweight model possible for accurately discriminating edible from poisonous mushrooms given a fairly limited dataset containing only categorical data.

The second part of this project is a CNN that classifies genus of mushroom images. It is built as an example of a polyglot analysis, with the neural network design and training done in python while the summary statistics on results are done in R.

Datasets

The dataset we're using is found on Kaggle and is publicly available. It consists of quite a few aspects of mushrooms.

The image dataset is found also on Kaggle, and is around 2Gb consisting of a few thousand jpg images.

Installation and Running

All parts of the analysis (initial R and new polyglot) are included in the writeup now.

To get reports (the easy way):

git clone https://github.com/Lswhiteh/bios611-project1
source activate aliases.sh

#Build
$ dbuild

#Run bash terminal through Docker image
$ b

#or

#Run Rstudio through Docker image
$ r

If using Rstudio:

To get reports (the hard way):

$ git clone https://github.com/Lswhiteh/bios611-project1

#Build 
$ docker build -f Dockerfile . --tag rcon

#Run
$ docker run -v `pwd`:/home/rstudio -e PASSWORD=not_important -it rcon sudo -H -u rstudio /bin/bash -c "cd ~/; /bin/bash"

Regardless how you get there, you can generate the project 1 report by going to a bash terminal inside of the Docker container and:

#WARNING: AFTER NEURAL NET UPDATE THIS WILL TAKE QUITE LONG
$ make Mushroom_analysis.pdf

And that's it! There should be a pdf called "Mushroom_analysis.pdf" in the base project directory with the completed writeup/figures.


Shiny app

#After sourcing aliases.sh
$ shiny

Go to http://0.0.0.0:8788 in your web browser.


Homework 4

Simply $ make homework4 and the homework4.pdf will be in the base homeworks directory.

Homework 5

Simply $ make homework5 and the homework5.pdf will be in the base homeworks directory.