CUDA_Fractal: A Cuda repository from etotheipi

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Author:     Alan Reiner  (USA)
Orig Date:  05 Jan, 2011

Last Update:  03 Feb, 2011

Description:  Interactive fractal viewer.  Uses CUDA for super-fast fractal 
              generation, and OpenGL for displaying and interacting

                                  Windows   Linux     Mac
                     CUDA Compute
                          1.0        ?        No       No
                          1.1        ?        No       No
                          1.2       Yes       No       No
                          1.3       Yes       No       No   (All GTX 2XX)
                          2.0       Yes      Yes       No   (All GTX 4XX)
                          2.1       Yes      Yes       No   (GTX 460)

             I don't have a MAC on which to try this.  


--------------------------------------------------------------------------------
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
*** One code base for Windows and Linux
--------------------------------------------------------------------------------
Update  04 Feb, 2011

Checked out the Win64 branch, updated the makefile, and compiled.  It works!
So I merged it into the master branch and all new development will continue
from here.


--------------------------------------------------------------------------------
*** Windows Support!
--------------------------------------------------------------------------------
Update  03 Feb, 2011

After a tremendous amount of pain and frustration, it appears that NVIDIA does 
not want any Windows users to get into CUDA.  They don't make their SDK compile 
in MSVS 2008 or 2010, and normal constructs I can use in Linux fail miserably
in Windows.  I had to do complete code re-organization to get this to work...

With the exception of the mouse scroll wheel, the exact same code CUDA/OpenGL
code that worked in Linux, also works in Windows.  That's nice... 

At the moment there are two branches... I need to check out the win64 branch
in Linux and see if I can get it working via preprocessor branches.  Then I'll
be able to merge the win64 branch into the master and everyone should be happy!

Then to start implementing new stuff!  Dynamic colormaps, new IFS functions,
randomization...



--------------------------------------------------------------------------------
*** MAJOR UPDATE -- Integrated OpenGL with CUDA for interactive fractal viewing!
--------------------------------------------------------------------------------
Update 30 Jan, 2011

The original fractal code in CUDA was combined with CUDA-OpenGL-interop code to
allow for real-time passing of CUDA output to OpenGL for display and interaction
(via video memory -- no need to pass image through host RAM).  I don't know how
NVIDIA could've made this task any more complicated... But the torture is over, 
and it works (LINUX ONLY -- no windows support yet).

Also, the code compiles and runs for CUDA Compute Capability less than 2.0, and
it actually loads without crashing, but it doesn't actually display a fractal.  
So, for the time being, consider this code Fermi-only.

With OpenGL installed, this produces a real-time Julia set.  Left-click-drag
will pan the view, scroll wheel will zoom.  Right-clicking and dragging will
adjust the C-parameter, which causes the fractal to morph in real time.  At
this time, the colormap can only be changed via added a cmap*.txt file and 
updating the code in main_gl.cu (and recompiling).


---------------------


This CUDA code is a very basic fractal generator using CUDA (NVIDIA gfx card)
The code is kind of sloppy, and simplistic (though Mandlebrots and Julia sets 
aren't difficult), but it works if you have CUDA installed and a newer NVIDIA
graphics card.  Eventually, as I learn more about fractals, I will expand this
library to generate more types of fractals.

To learn more about CUDA, or to download more of the files you need to compile
this code, you can check out my other CUDA project on github, 
      CUDA-Image-Processing


If you are familiar with CUDA, you will realize that fractals are an absolutely
*PERFECT" application for CUDA.  You don't have to pass in an image, only a few
parameters that define the fractal, and the calculations are relatively simple
and completely parallelizable.  You should see the full advantage of your GPU 
with this program.

For reference, I put this same code together, though much more simply, in MATLAB
using MATLAB complex numbers and it took 632s to generate a 2048x2048 Mandlebrot
with max escape time of 256.  By comparison, a 8192x8192 Mandlebrot with max 
escape time of 1024 took less than 1 second using this code.  Epic!

Typically you would expect 20x-200x speedup from CUDA for most parallelizable
applications (like image processing).  However, this program demonstrates that
sometimes you can achieve even more than that (10,000x?) if your problem is 
just right.  





---------------------
Update 13 Jan, 2011

Added a Colormap class for defining what colors you want in the output png.
Will eventually implement the capability to supply N colors, and a 256x3
colormap will be constructed by spreading out those three colors across the
spectrum of gray values and interpolating the missing ones.  An example cmap
is provided, via cmap_blue_green.txt

Also implemented more useful tiling to the rendering process.  Rather than
writing each CUDA tile to a separate file, it now allocates one large host
image and writes each CUDA tile to the host RAM.  This assumes that you are
rendering an image that is too big for your GFX card, but will fit in host
RAM.  I changed this because it looks like there's nothing useful to come
out of writing separate files (no print shop will be able to do anything 
with a sequence of files...)

---------------------
Update 12 Jan, 2011

** Added Julia Sets:
         Julia sets turned out to be very closely related to the Mandlebrot,
         and it was actually only a couple extra lines of code to modify the 
         kernel to do Julia instead.  This also gives me the possibility of
         doing 3D fractals by varying the c parameter.  However, this may not
         be desired, since it seems most Julia sets are rich enough in two
         dimensions.

** Added writePngFile():
         I finally succeeded in harnessing the libpng[12] library to write
         png files directly from the main() function, without every going
         through MATLAB.  Bypassing MATLAB saves a lot of time and RAM.  
         
** createcolormap.m:
         I created a MATLAB method for exploring colormaps.  It creates a
         256x3 matrix of colors, one for each gray level, to be applied to
         an image loaded in MATLAB.  Given the completion of writePngFile()
         this won't be so useful as a MATLAB script, but will soon be adapted
         to C++ so that it can be used in the writePngCode() to add color 
         via the C++ code.

** cudaQuaternion<T> class:
         Extends complex numbers to four dimensions. Originally planned to 
         use quaternions for 3D Mandlebrot, but found out the there is no
         such thing (or, rather, the Mandlebrot is pretty boring in higher
         dimensions).  However, these may be useful for other kinds of
         fractals...


---------------------


Update 09 Jan, 2011

** Added the cudaComplex<T> class: 
         Can be run by devices of CUDA compute 2.0 or higher.  Using this, 
         I can use complex numbers as native, algebraic objects, and will 
         be able to implement any kind of IFS fractal, now.  Using the new
         cudaComplex<T> class increases computation time by about 5-10%
         (prob due to extra memory alloc/dealloc due to operators returning
         copies of the answers, instead of writing directly to a var)
         
         Manual complex-multiplication in kernel (4096x4096 tile, 1024 max esc):
         ---Single-precision w/ mem copy:  0.195 s
         ---Double-precision w/ mem copy:  0.562 s

         Using cudaComplex<T> (4096x4096 tile, 1024 max esc):
         ---Single-precision w/ mem copy:  0.202 s
         ---Double-precision w/ mem copy:  0.624 s


** Complex<T> class:
         This was a regular C++ implementation of the cudaComplex<T> before
         I realized that it needed to be converted to __device__ code.  I left
         this class in the project even though it's completely redundant w.r.t.
         to std::complex<T>

** Quaternion Class:
         I implemented this in the most tedious way possible, to later realize
         it's probably completely unnecessary.  It hasn't been compiled or
         tested in any way, I just wanted to get the non-commutative algebra in
         there.  I should be able to use them for higher-dimensional fractals,
         later.
etotheipi/CUDA_Fractal