############################################################################### ## WeizGrid ## By Stav Yagev, 2013 (Please contact me if you find bugs !) ## ## ## Simple framework for using the SGE cluster on Weizmann for parallel work. ## ## FEATURES: ## - Helps split your own code into fractions that can be run in parallel on the cluster. ## - Allows you to aggregate results in an efficient manner. ## - Write your code ONCE! Execute the same code on PC for debugging and on the cluster ## for production. ## - Recover from errors on the cluster - splitting means if 1 iteration out of a ## 1000 failed you still have 999 iterations in your hand! ## ## Use case example: ## You have a job that processes 1000 images in some way, running the same algorithm ## for each image, outputing something for each image, and finally aggregating the ## results from all images. Up until now, you ran this in a single job that took 1000 ## minutes (because lets assume it takes 1 minute to process each image). Using WeizGrid ## you can take advantage of the fact that the job can be parallelized - instead of 1 ## job WeizGrid helps you easily split the job to 80 parallel jobs so that you get all ## 1000 images in 1000/80=12.5 minutes (!!) ## Auto Installation: ================== 1. Copy all the files to ~/WeizGrid 2. In a UNIX terminal, type: chmod +x ~/WeizGrid/wgsetup 3. Then type: ~/WeizGrid/wgsetup Usage: ====== 1. Create files in the spirit of 'sample.m' and 'calcPrimes.m' (make sure the WG m-files are in MATLAB's path) 2. Upload your files to UNIX 3. Under UNIX, from the direcotry of YOUR project, run: ~/WeizGrid/qsubWG [name] [your-m-file] [queue] Example: ~/WeizGrid/qsubWG ParallelCracker sample all.q 4. Enjoy! Manual Installation (if auto doesn't work...): ============================================= 1. Copy all files to ~/WeizGrid. 2. Make sure all bash scripts have execute permission. 2. Create the following directory structure (1 folder for each queue you intend to use): ~/.matlab/cluster_jobs/all.q ~/.matlab/cluster_jobs/test.q ... NOTE: The ~/.matlab/ usually already exists but is hidden, so make sure you are viewing hidden files and folders . Tips & Troubleshooting: ======================= - If your algorithm uses a random number, make sure you are not setting the RngShuffle option to false when invoking WGexec - because this will yield the same random series for every "parallel" piece of work. - Sometimes, during the aggregation part of your script, there will be a bug. Then, you might think all your work is lost! This is not the case, look into part #2 of the documentation of WGgetResults for more information. - There isn't verbose error checking, so if something doesn't work, check the output files on your UNIX home for hints... If still unsuccessful, feel free to contact me for help.