/oclraster

Flexible Rasterizer in OpenCL

Primary LanguageC++GNU General Public License v2.0GPL-2.0

Flexible Rasterizer in OpenCL

This is the source code to my bachelor’s thesis “Flexible Rasterizer in OpenCL”:

This project implements an essentially OpenGL 2.0-level software graphics pipeline, called OCLRaster (short for OpenCL Rasterizer), with the addition of some unique features and functionality of more recent OpenGL versions, but also the exclusion of some other features. The pipeline is written and accelerated by OpenCL C on the device side and C++ on the host side, and is capable of running on all OpenCL 1.1 desktop hardware. This includes most modern GPUs and CPUs.

Among the main goals are to provide a simple host API and an easy way to program the vertex and fragment stage, with the direct intention of being similar to a hardware graphics pipeline and API, and accordingly requiring no modification of the pipeline. Both of these should allow for a rather uncomplicated migration of OpenGL programs.

In regard to the implemented features, this software pipeline supports fully programmable depth testing and blending, which are both not possible on today’s graphics hardware, instanced rendering, scissor testing, the previously mentioned vertex and fragment stage programmability, miscellaneous buffer objects in a simplified and unified way, 2D images (hardware accelerated formats and software emulation for unsupported formats), framebuffers and multiple render targets with less restrictions than hardware pipelines, and of course rendering with perspective and orthographic projection modes. Other OpenGL 2.0-level features are however not supported. These include stencil testing (which can however be partially simulated in software by simply using an additional framebuffer attachment), anti-aliasing, 1D and 3D images, occlusion querrying and all of the now obsolete legacy draw functions and modes. The reasons for this are not of any technical nature that would prevent their implementation, but rather due to the time constraints of this thesis/project.

The thesis can be found in the “etc” folder (oclraster_thesis.pdf).

Software Requirements and Important Information:

  • on Windows:
    • if you have an AMD or Intel CPU: please install the Intel OpenCL SDK for Windows
    • alternatively, or if you have an AMD GPU (not recommended for CPU devices due to performance issues): install the AMD APP SDK (AMD APP SDK)
    • if you have a Nvidia GPU: please install the latest graphics drivers
    • make sure to select the correct opencl platform in config.xml
  • on OS X:
    • please install the latest XQuartz
    • 10.7: OpenCL on 10.7.5 on non-CPU devices is broken, please use a CPU device for now or use an earlier version of OS X (or upgrade to 10.8+)
    • 10.8: update to at least 10.8.3 (prior versions are majorly broken)
    • 10.9: should work OOTB
    • if you have a Nvidia GPU, you can also use CUDA (set the opencl platform value to “cuda”) – note that this is still experimental
  • on FreeBSD:
    • only FreeBSD 10.0 is supported (and possibly any future versions)
    • since there is no direct OpenCL support from any hardware vendor, you’ll have to install and use pocl 0.8+ (currently only manually)
    • I’m currently having issues with FreeBSD + X11 forwarding + OpenGL FBO usage, so if the normal configuration is failing for you (only rendering a black screen), I’d recommend enabling the “gldrawpixels” option with premake.sh
  • on Linux:
    • install any of the AMD, Intel or Nvidia OpenCL SDK+libs+drivers or/and pocl 0.8+ and ocl-icd
    • when using pocl on linux, the premake.sh “pocl” option is unnecessary and should not be used

Requirements:

  • Windows: NT 6.0+ x86 (Vista/7/8/2008/2008R2/2012/2012R2); NT 5.1/5.2 x86 (XP/2003) support is uncertain
  • OS X: 10.7+
  • Linux: any x64 distribution that supports the AMD, Intel or Nvidia OpenCL SDK/drivers
  • FreeBSD: 10.0+ (amd64; i386 support is uncertain)
  • OpenGL 1.1+
    • support for OpenGL 3.0+ or ARB_framebuffer_object or EXT_framebuffer_object + EXT_framebuffer_blit is recommended
    • glDrawPixels is supported as a fallback option, but has to be enabled at build time via “./premake.sh gldrawpixels” (note: this can run on the Microsoft GDI driver or weird X11 forwarding setups)
    • note that OpenGL is only needed to actually display the framebuffer
  • OpenCL 1.1+ capable GPU/CPU and platform (GeForce 400+, Radeon HD5+, Core2+, Athlon64+)
  • OpenCL extension support for:
    • cl_khr_global_int32_base_atomics
    • cl_khr_global_int32_extended_atomics
    • cl_khr_local_int32_base_atomics
    • cl_khr_local_int32_extended_atomics
    • cl_khr_gl_sharing/cl_APPLE_gl_sharing
  • 1 GB RAM minimum, 2 GB RAM are recommended
  • 256 MB VRAM minimum, 1 GB VRAM are recommended

Build Instructions:

  • install/build floor
  • after cloning: git submodule init && git submodule update
  • after a submodule update / pull: git submodule update

on Linux and FreeBSD:

  • install all necessary libraries and programs, including clang 3.2+ and libc++ (currently 1101) (→ Credits)
    • on Arch Linux (sub-dependencies are installed automatically): pacman -S clang libc++ sdl2 sdl2_image libxml2 premake freetype2 opencl-headers cuda
    • any of: opencl-nvidia (in addition to Nvidia drivers), intel-opencl-sdk (from AUR, CPU only or in addition to Intel GPU drivers) or catalyst + misc
  • run ./premake.sh or “./premake.sh pocl gldrawpixels” on FreeBSD
  • make
  • there is an install script located in lib/ which should be executed from that folder (will install to /usr/local/)
  • alternatively:
    • sudo ln -sf /path/to/oclraster/bin/liboclraster{,d}.{a,so} /usr/local/lib/
    • sudo ln -sf /path/to/oclraster/lib /usr/local/include/oclraster
  • when using X11 forwarding, set these env variables:
    • export LIBGL_ALWAYS_INDIRECT=yes
    • export SDL_VIDEO_X11_NODIRECTCOLOR=yes

on Windows/MinGW:

  • install custom MinGW builds gcc-dw2-4.6 and clang-3.2 (thanks to rubenvb for providing these!)
  • install MSYS to your MinGW folder
  • build/install all necessary libraries, including libc++ (→ Credits)
  • add a Windows system environment variable called “MINGW_ROOT” and set it to your MinGW root folder (e.g. “/c/mingw/mingw32/”)
  • add the “mingw/bin” path to the Windows system environment variable called “PATH
  • run “./premake.sh”
  • make
  • libraries and headers will be automatically installed to your previously set MinGW folder
  • note: to make things easier, I might provide a full MinGW package containing all required libraries at a later point

on OS X:

  • install Xcode 4.6+, Xquartz, SDL2 and SDL2_image (and optionally CUDA)
  • if you’re using /Library/Frameworks:
    • 10.8 SDK: ln -sf /Library /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/Library
    • 10.9 SDK: ln -sf /Library /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk/Library
  • compile oclraster.xcodeproj
  • install/symlink oclraster .dylib and lib folder:
    • if /usr/local or any sub-folders do not exist, create them (sudo mkdir -p /usr/local/{include,lib})
    • sudo ln -sf /path/to/oclraster/bin/liboclraster{,d}.dylib /usr/local/lib/
    • sudo ln -sf /path/to/oclraster/lib /usr/local/include/oclraster
  • compile any samples you like

Credits:

Relevant Links and Papers:

Screenshots:

  • progress as of 2013/04/27, running on a GPU in OS X:
    programmable depth testing using a custom depth test function:
    #define depth_test(incoming, current) (fmod(incoming, 0.1f) > 0.05f)

    programmable depth testing: fmod(incoming, 0.1f) > 0.05f
  • progress as of 2013/04/24, running on a GPU in OS X:
    combined 3D and 2D GUI rendering (provided by oclraster_support)
    combined 3D and 2D GUI rendering (provided by oclraster_support)
  • progress as of 2013/04/18, running on a GPU in OS X:
    2D/orthographic rendering
    2D/orthographic rendering
  • progress as of 2013/04/15, running on a GPU in OS X:
    render-to-texture + rendering a triangle fan
    render-to-texture + rendering a triangle fan
  • progress as of 2013/04/04, running on a GPU in OS X:
    sliced volume rendering
    sliced volume rendering
    sliced volume rendering
  • progress as of 2013/04/03, running on a GPU in OS X:
    render-to-texture / multi-framebuffer support
    render-to-texture / multi-framebuffer support
  • progress as of 2013/03/28, running on a GPU in OS X:
    programmable blending and more image functionality: native and buffer images, sampling a 32-bit float noise texture, …
    programmable blending
    programmable blending
    more image functionality: native and buffer images, sampling a 32-bit float noise texture, ...
  • progress as of 2013/02/21, running on a CPU in OS X and on a dual-core ARM CPU in iOS:
    subdivided blender monkey with parallax mapping
    subdivided blender monkey with parallax mapping
    subdivided blender monkey with parallax mapping
    note: framebuffer is upscaled by 2x and there are lots of issues with OpenCL on iOS → defect depth testing
    note: framebuffer is upscaled by 2x and there are lots of issues with OpenCL on iOS -> defect depth testing
  • progress as of 2013/01/19, running on a GPU in OS X and inside a Window VM on a CPU:
    moar bunnies
    bunny on windows
  • progress as of 2012/12/06, running on the CPU:
    a bunny of course