/GGDMS

Distributed order statistic selection with GPUs

Primary LanguageHTML

/* Copyright 2012 Emircan Uysaler, Jeffrey Blanchard, Erik Opavsky
 * Copyright 2011 Russel Steinbach, Jeffrey Blanchard, Bradley Gordon,
 *   and Toluwaloju Alabi
 *   Licensed under the Apache License, Version 2.0 (the "License");
 *  you may not use this file except in compliance with the License.
 *  You may obtain a copy of the License at:
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *     
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License. 
 */

GGMS: Grinnell GPU Multi-Selection
All original work done for this project is licensed under the Apache 2.0
license. This project is built on the GGKS project of Alabi, Blanchard, 
Gordon, and Steinbach.


 ************************************************************************
 ************************************************************************
 ********************** LIBRARIES ***************************************
 ************************************************************************
 ************************************************************************

 THRUST: a variety of thrust primitives are used throughout GGMS, so it is
 necessary to have a copy downloaded. For convenience a copy has been
 included. 


 ************************************************************************
 ************************************************************************
 **********************  bucketMultiSelect ******************************
 ************************************************************************
 ************************************************************************


 bucketMultiSelect: The source code for bucketMultiSelect is provided in 
 bucketMultiSelect.cu. It is named as we conceptually place elements into
 buckets, determine which bucket contains the desired order statistics,
 reduces the problem to only those buckets containing desired order statistics,
 and extract the desired order statistics by sorting the reduced vector.

 The details of the algorithm are provided in the following paper:
 
 Selecting Multiple Order Statistics with a Graphics Processing Unit
 Jeffrey D. Blanchard, Erik Opavsky, and Emircan Uysaler
 Submitted, 2013.
 Online [Available]: www.math.grinnell.edu/~blanchaj/Research/SMOSGPU.pdf

 Any work which uses this algorithm, software, or derivatives of this 
 software should cite the above paper.

 Note:

 1. naivebucketMultiSelect: this algorithm is included in the package, but
 is not executed by default.  The algorithm does not utilize the kernel
 density estimator to define buckets concentrated at the density of the 
 values in the vector.  Instead, it generates equal sized buckets from
 the minimum to maximum value in the vector.  To include this function
 in tests, change the last 0 to 1 in every instance of

 algorithmsToRun[NUMBEROFALGORITHMS]= {1, 1, 0};

 that appears in the code.  You are likely to see an improved performance
 on vectors with uniformly distributed elements, and significantly degraded
 performance on vectors from other distributions.


 ************************************************************************
 ************************************************************************
 ********************** COMPILING THE CODE  *****************************
 ************************************************************************
 ************************************************************************

 There are a variety of functions which generate executables.  All executables
 can be compiled using the included MAKEFILE.  You must ensure the all path
 variables are set appropriately for your machine.  The MAKEFILE is designed
 for a Linux operating system.  For other operating systems, you must create
 your own MAKEFILE or compile the functions individually.

 To compile the functions for testing user specified problems compile 
 compareMultiSelect, analyzeMultiSelect, and realDataTests by running

 >> make all

 To compile the functions used to generate the data for the paper 
 "Selecting Multiple Order Statistics with a Graphics Processing Unit",
 run

 >> make allSMOS

 See the MAKEFILE for compiling specific files.


 ************************************************************************
 ************************************************************************
 ********************* RUNNING TESTS, GENERATING DATA *******************
 ************************************************************************
 ************************************************************************

 Idividual Testing:
 To run a user defined set of tests, utilize
 
 >> ./compareMultiSelect 

 to generate timing data compared to sorting the vector with thrust::sort. 
 The executable will prompt you for input.

 To indentify the maximum number of order statistics bucketMultiSelect
 can acquire in the time required for sort&choose, utilize

 >> ./analyzeMultiSelect

 The executable will prompt you for input.

 Tests for generating data in the paper:
 To run the tests for the paper, execute each of the files beginning with
 SMOS:
 >> ./SMOSanalyze
 >> ./SMOStimingsOSDistrAll
 >> ./SMOStimingsDistrUniform
 >> ./SMOStimingsTableData
 >> ./SMOStimingsVectorGrowth

 Running all of these programs in the current settings will take approximately
 12 hours on a Tesla C2070.  The majority of the time is spent on vectors
 larger than 2^(26).  To run a subset of the tests, simply change the ranges
 in the SMOS*.cu files prior to compilation.

 The realDataTests program takes a .csv file as input.  For example, 

 >> ./realDataTests "house_5.csv"

 Due to it's size and its source, the data is not included in the software package.
 A compressed zip file containing the data sets for the paper is available
 on Jeff Blanchard's research page.    
 www.math.grinnell.edu/~blanchaj/Research/Research.html

 Notes: 

 1. The setting for compareMultiSelect when choosing unsigned 
 integers, unsigned integers 0-4 actually generates unsigned
 integers from 0 to 100 to permit observations in line with the findings
 reported in the paper.  To generate other discrete integer ranges, simply
 alter line 54 of generateProblems.cu.

 2. When running analyzeMultiSelect, if order statistic distributions other
 than uniform or uniform random are selected, you will likely find the tests
 returning many errors when the number of order statistics exceeds n/100.
 This occurs because the memory allocated for storing the desired order 
 statistics is set at .01*n.  If you wish to test these other order statistic
 distributions, simply alter line 232 of analyzeMultiSelect.cu. 


 ************************************************************************
 ************************************************************************
 ********************* PROCESSING DATA **********************************
 ************************************************************************
 ************************************************************************


 To process the output data, you should compile readMultiSelectOutput.cpp
 by executing

 >> make ProcessData

 Then run 

 >> ./readMultiSelectOutput

 To generate the plots and tables in the paper, run

 >> readscript

 Then, open the appropriate *.m files, input the correct date(s), and 
 execute the *.m files in MATLAB.

 To see the values reported by SMOSanalyze, execute

 >> cat CR* | grep "k value"

 where we have assumed all data files in the folder are relevant.  Adding 
 more information to "CR*" will permit you to use only the relevant files.

JDB