/Conv_Net_Simple

An application of a convolutional net in tensorflow, complete with tensorboard analytics and model checkpoints

Primary LanguagePython

README

CIFAR-10 CITATION: Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009.

INTRO



Implements a general form of a feed forward Convolutional neural network with dropout on 
each of the artificial layers and normalization of the inputs. Complete with Tensorboard
and periodic model parameter saving to easily start from the last checkpoint.
  
Three functions required to deploy network:
	get_Input_Data
	create_CNN
	train_CNN

The network is built as a computation graph within tensorflow.




FUNCTIONALITY



get_Input_Data
	
	Purpose:
		
		Retrieve 1 of 3 datasets for use in training.
		
	Definition:
	
		get_Input_Data(name, mode="coarse")
		
	Input:
	
		accepts a small range of inputs:
		
		get_Input_Data("mnist")
		get_Input_Data("Cifar-10")
		get_Input_Data("Cifar-100", "coarse")
		get_Input_Data("Cifar-100", "fine")
		
		Supplying keyword "mnist" or "Cifar-10" extracts the data as normal.
		Supplying "Cifar-100" adds the ability to choose if you want to predict
		the coarse labels (20 labels) or the fine labels (100 labels).
		

	Returns:
	
		first four return values are numpy arrays, last two are integers

		trainX, trainY, testX, testY, size_of_a_single_test_case, number_of_target_classes


	Method:
	
		First checks if the extracted data it wants to work with is in the working folder,
		if not it checks if the tar.gz of the data exists and extracts if it does.  If 
		the archive doesn't exist either, it downloads and extracts the datasets from 
		Toronto's Machine Learning data repo: 
		http://www.cs.toronto.edu/%7Ekriz/cifar.html (CIFAR-10 and CIFAR-100)

		After reading the data into python, it then one hot encodes the labels and 
		normalizes the testing and training data by subtracting the mean and dividing 
		by the std deviation.

		Then it returns the data in the form specified above.
		
	





create_CNN

	Purpose:
	
		Creates the computational graph that represents the network specified by the 
		input parameters.  Capable of creating it's own structure for the graph 
		dynamically based on the size of the input data if no parameters are specified.
		
		Uses tensorboard to visualize the distribution of the weights, biases and 
		activations over time, also takes a sample of what's being input to the network
		to highlight if there's any problems with the input images.
		
		When handling each of the layers, they are all given their own name space to 
		improve the readability of the tensorboard graph.

	Definition:
	
        create_CNN(inputLayerSize, outputLayerSize, fcShape=[], convShape=[], 
                   silent=False, minActGrid=4, filterX=5, filterY=5, 
                   startingFeatures=32, featureScaling=2, poolingSize=2, 
                   color=False)
    
    Input:
    
    	inputLayerSize is the size of one input example, so if you are using 28 x 28
    		grayscale images (1 color channel), this number would be 28*28*1=784.  If using
    		32 x 32 color images (3 color channels), this number would be 32*32*3=3072.
    		Takes an integer.
    	
    	outputLayerSize is the number of categories that the data could fall into. Takes
    		an integer.
    	
    	fcShape dictates the shape of the fully connected layers that come after the
    		convolution.  Each entry contains the number of nodes in each layer, as well
    		the first and last entries will correct themselves to the right values.
    		ex: [0, 240, 360, 0] specifies 4 layers, first one is the same size as the 
    		output from the convolutional layer, next has 240 nodes, next has 360 nodes 
    		then finally the output layer has the same number of nodes as there are 
    		categories.
    	
    	convShape dictates the shape of the convolutional layers, each entry
    		corresponds to a layer.  There are many ways to enter data for a layer:
    	
    		[filterX, filterY, FeaturesToProduce, ksize, strideSize (for pooling)]
    		or
    		[filterSize, FeaturesToProduce, ksize, strideSize (for pooling)]
    		or
    		[filterSize, FeaturesToProduce, poolingSize]
    		or
    		["bottleneck", FeaturesToProduce, poolingSizeOnLast]
    	
    		Where filterX and filterY dictate the shape of the kernel at the layer (if 
    		it's just filterSize then filterSize=filterX=filterY), FeaturesToProduce is 
    		an indicator of how many filters should be used at this layer, ksize is the 
    		maxpool size and strideSize dictates what size strides will be used in the 
    		pooling step (if only poolingSize is supplied, poolingSize=ksize=strideSize).
    	
    		Finally, ["bottleneck", FeaturesToProduce, poolingSizeOnLast] creates three 
    		layers, a 1x1 conv layer no pooling, 3x3 conv layer no pooling, 1x1 conv 
    		layer with "poolingSizeOnLast" size pooling, all three layers use 
    		"FeaturesToProduce" number of filters.
    		
    		ex: if supplied with:
    		[[5, 32, 2], 
    		 [1, 3, 64, 3, 2], 
    		 [5, 256, 4, 3]
    		]
    		first layer has a 5x5 filter, 32 features and 2x2 pooling with 2x2 stride
    		second layer has 1x3 filter, 64 features and 3x3 pooling with 2x2 stride
    		third layer has 5x5 filter, 256 features, 4x4 pooling and 3x3 stride
    	
    	silent is a boolean value, if set to true, will print out less information.
    	
    	minActGrid, filterX, filterY, startingFeatures, featureScaling and poolingSize
    		are all used by default_conv_net (defined below)
    	
    	color is a boolean value, if set to true, then it is assumed the input data has	
    		three color channels, and the samples are set up in the following format:
            [[image1],
             [image2],
              ...
            ]
    		
    		Where each image is stored as all red pixels, then all green pixels then all
    		blue pixels, starting with the first row of red pixels.
    		
    		ex:
    		
    		first image is 32x32 and stored as: 
    		[RRRRRRRRRRRRRRRRRRRR...GGGGGGGGGGGGG...BBBBBBBB]
    		Where the first 32 R's correspond to the R values for the first row of the
    		image.
    		
    		All of this must happen because of the reshaping and transposing function
    		to input the data into the convolutional layers assumes this format.
    		
    	
    	Returns:
    	
    		IN is the input placeholder tensor, where trainX or testX should be fed
    		LABEL_IN is the label placeholder tensor, where trainY or testY should be fed
    		OUT is a tensor of the predictions made by the net based on data in IN
    		keepProb is the probability of keeping the value of a node or dropping
    			it to reduce overfitting, default 50% of the nodes are kept and 50% 
    			dropped (takes a float value between 0 and 1)
    			
    		IN, LABEL_IN, OUT, keepProb
    		
    	Method:
    	
    		First parses the shape provided for convShape, changes any keywords into
    		multiple layers based on their function.  If a convShape hasn't been defined, 
    		it creates one using default_conv_net.  It then adds on the convolutional
    		layers to the placeholder it created (note: I wrote the conv layers and art 
    		layers functions in such a way that they append themselves to whatever 
    		tensor I provide (provided it can be reshaped correctly.  In this case, I'm
    		providing the input placeholder tensor).
    		
    		Then it parses the shape of the requested fully connected layers (setting the
    		size of the output of the conv layers as the first layer size).  Then it
    		appends the fully connected layers to the network.  These layers have dropout
    		between each of them to improve on overfitting (which I found required longer
    		training times, but produced 4% better results on my best architecture for
    		Cifar-10).
    		
    		After which, all the placeholders are returned as well as the method of
    		accessing the network's predictions based on chosen input.



default_conv_net

	Purpose:
	
		Define an adequate convolutional net structure for the input data when one is not
		provided by the user.
		
	Definition:
	
		default_conv_net(inputLayerSize, minActGrid=4, filterX=5, 
                         filterY=5, startingFeatures=32, featureScaling=2,
                         poolingSize=2, color=False)
    
    Input:
    	
    	inputLayerSize is used to determine when to stop making layers.  First the square 
    		root is taken to determine the length of one side of the square images, then 
    		it is divided by the poolingSize each layer as such: 
    		inputLayerSize = ceil(inputLayerSize/poolingSize)
    		
    		This is being used to track the size of the activation planes that are being
    		output by each layer of the convnet, once it finds that the size of the 
    		activations on a layer are smaller than "minActGrid", it stops making layers.
    		
    	minActGrid controls the minimum size of the activation grids that are being
    		output by the final layer of the convnet.  So for example, the default is 4,
    		so once pooling has reduced the output of the convolutional layer to 4x4
    		grids, no more layers are created
    		
    	filterX describes the size in the X dimension of the filters used at each layer
    	
    	filterY describes the size in the Y dimension of the filters used at each layer
    	
    	startingFeatures is the number of filters that will be used on the first layer.
    		At each subsequent layer the number of filters is multiplied by 
    		"featureScaling", so in the default, each layer has double the filters
    		of the previous
    	
    	featureScaling is a multiplier for the number of filters used in each layer,
    		each layer has "featureScaling" times as many filters as the previous
    	
    	poolingSize is the size of the pooling used at each layer, the stride size is
    		equal to the pooling size
    		
    	color is a boolean, if it is true, we assume that there are 3 color channels and
    		first divide inputLayerSize by 3 before taking the square root
    	
    	
    Returns:
    
    	the shape of the convnet it created (as a list)
    
    
    Method:
    
    	Determine the size of the inputs: sqrt(inputLayerSize) if working with grayscale
    	or sqrt(inputLayerSize/3) if working with color.  This variable keeps track of
    	the size of the current outputs (pooling reduces the size of the outputs).  Once
    	the size of the outputs of the current layer are equal or less than the desired
    	minimum output size (dictated by minActGrid), no more layers are produced.
    	
    	Each layer produced has filters of size filterX x filterY, pooling of size
    	"poolingSize" and "previousLayerFeatures" * "featureScaling" number of features.
    	
    	Then the shape it decided on is returned for use by create_CNN
    


train_CNN
    
    Purpose:
    
    	Appends the desired training algorithm, default is Adam optimizer, but RMSProp is
    	also available for second order optimization (helps to prevent problems with 
    	non-convex loss functions).  Saves the model periodically and uses tensorboard to 
    	track accuracy on current batch from training data as well as tracking the 
    	(hopeful) reduction of the cost function.  Saves the structure of the 
    	computational in a much more readable format in tensorboard.
    	
    	All the modules appended in this step are done so in the name space of "EVAL" to
    	improve the visualization of the graph in tensorboard.
    	
    Definition:
    
    	train_CNN(IN, LABELS_IN, LOGITS, keepProb, trainX, trainY, testX, testY, 
              keepPercent=0.5, batchSize=50, trainingEpochs=20000, alpha=1e-4,
              silent=False, dest='', modelDest='', modelExists=False, opt="Adam")
            
    Inputs:
    	
    	IN is the placeholder tensor for input data X.
    	
    	LABELS_IN is the placeholder tensor for the label data Y.
    	
    	LOGITS is the output tensor of the network
    	
    	keepProb is the placeholder tensor for dropout percent
    	
    	trainX is the input training data
    	
    	trainY is the input training labels
    	
    	testX is the final testing data to evaluate the network
    	
    	testY is the final testing labels
    	
    	keepPercent is the complement of the dropoutRate. DropoutRate dictates the
    		percentage of nodes on the previous layer to randomly ignore in order to
    		improve the generality of the predictions made. So keepPercent is
    		1 - dropoutRate
    	
    	batchSize is the size of the randomized batch fed to the optimizer for each epoch
    	
    	trainingEpochs is the number of iterations the training loop is executed
    	
    	alpha is the "learning rate", more intuitive to think of it as the step size
    		moved when using back propagation to adjust the weights and biases.
    	
    	silent is boolean, if true, will not give nearly as much information
    	
    	dest is the path that the tensorboard data will be saved.  If not specified, it
    		will prompt the user for a path
    	
    	modelDest is the path where the model checkpoints will be saved.  If not 
    		specified, it will prompt the user for a path
    		
    	modelExists is boolean, if it's true, train_CNN will attempt to reload the last
    		checkpoint into the current model (will only fail if the models have
    		different architectures)
    	
    	opt is a string that is either "Adam" or "RMS", specifies the optimizer that will
    		be used.  "Adam" specifies the adam optimizer, "RMS" specifies the RMSProp
    		optimizer.
    
    Returns:
    
    	IN: The input placeholder tensor
    	LABELS_IN: the label placeholder tensor
    	LOGITS: the output of the network
    	keepProb: the placeholder for dropout percent
    	sess: the session that was just trained
    	
    	IN, LABELS_IN, LOGITS, keepProb, sess
    	
    	
    	
    Method:
    
    	Uses a cross entropy loss function that is minimized by either Adam or RMS
    	versions of gradient descent.  Appends onto that a function that calculates
    	the accuracy of the output of the network based on the inputs and the correct
    	input labels.
    	
    	Sets up the summaries to log the training, graph saver to store a representation
    	of the graph in tensorboard and the saver to periodically save checkpoints of 
    	the model's training.
    	
    	Executes the training steps, saving tensorboard data every 5 steps and saving the
    	model checkpoints every 1000 steps (and on the last step of training).
    	
    	Once it executes all the training loops, it then evaluates the testing data to
    	check the quality of the internal representation of the data
    	
    	Returns all the parameters needed to work with the now trained model
    	
    	aside: I originally had it evaluate the testing data every 200 steps to make it 
    	easier to determine the best value for trainingEpochs, but it made the execution
    	run at 60% of the speed due to the large amount of data in the testing data (when
    	compared to the size of a batch of the training data) so I removed the feature.
    	
    
    	
FURTHER:

	Data augmentation would be a great addition: some rotations, reflexions, 
	translations, etc.	
	
	Batch Normalization looks very promising.
	
	experiment with dropout on conv layers
	https://arxiv.org/pdf/1506.02158v6.pdf
	https://www.reddit.com/r/MachineLearning/comments/42nnpe/why_do_i_never_see_dropout_applied_in/#bottom-comments
	If not dropout, then L1 or L2 regularization could be promising as well.
	
	Read about a good initialization method being the identity matrix. Also read about an initialization technique which uses
	RBMs to set the initial weights, then train them from that initialization.
	
	Bayesian hyperparameter optimization is very impressive, the evolutionary technique
	is a lot less efficient. Though evolutionary techniques are good for coming up with
	great optimizations of very complex systems (such as the designing the optimal
	shape for the antenna for two of NASA's expeditions)