/ToyMeshPathTracer

Toy Mesh Path Tracer that I used as a base for job interview tasks

Primary LanguageC++

Toy Mesh Path Tracer for a job interview task

I used the task below for some job interviews I did in 2019 Q2. The initial version was a super simple brute-force triangle mesh path tracer (no acceleration structures, no threading, etc.), and the task was to make it faster. Of course I wanted to know how much faster it can get, so I did some simple speedups myself in parallel.

The original non-optimized-at-all version is at 01-initial-job-task tag. I made other tags for later snapshots; here they are with performance numbers (measured on 2018 MacBookPro i9), on the teapot.obj (15706 triangles) and sponza.obj (66452 triangles) scenes respectively:

  • 01-initial-job-task, initial: 3.5 and LOLNOPE Krays/s
  • 02-multi-threaded, multi-threading: 18.9 and LOLNOPE Krays/s
  • 03-bvh, simple bounding volume hierarchy: 6978.2 and 1350.3 Krays/s
  • 04-simd, simplistic naïve SIMD: 8919.0 and 1853.1 Krays/s
  • 05-change-ray-tri-algo, better ray-triangle intersection test: 10251.1 and 1952.5 Krays/s
  • 06-shadow-rays, faster code path for shadow rays: 10888.0 and 2568.3 Krays/s
  • 07-compare-with-embree, compare with Intel Embree 3.5.2: 75965.0 and 40800.8 Krays/s (yup, Embree is fast!)
  • 08-compare-with-nanort, compare with NanoRT: 20638.2 and 4197.9 Krays/s

And yeah, I know -- most obvious next steps would be to use better BVH building heuristics (like SAH), and doing SIMD properly instead of "let's make our vector/ray/color class use SIMD". I did not get to that (yet?), but in any case -- the code is already over 3000 times faster than the initial version. With the BVH being the major win, as one would expect.

Original desceription of the interview task is below:

Interview task/assignment: speed up a simple path tracer

This project contains a simple triangle mesh path tracer implemented in C++. It's a command line application that takes screen size & input .OBJ data file parameters, renders it using "path tracing" algorithm and produces output.png result file.

Here are some images that the program can produce with the data files present under data/ directory:

result1 result2

The Task

Current program is slow. Really slow. Rendering that monkey head model ("Suzanne") at a lowly 640x360 resolution, 4 samples per pixel, takes one minute on my PC! Rendering the other image ("Sponza") at the same resolution takes five hours.

Your task then is, of course, to make the program faster :)

I know for sure that it is possible to make it faster by more than a hundred times -- e.g. I got Suzanne from a minute down to 0.2 seconds, and Sponza from five hours down to 8 seconds. It might be possible to make it even faster, but I did not quite go there.

Now, your task is to make it run as fast as you can. I'm not asking for a "hundred times", but something like "at least ten times faster" is what you should aim for.

It's entirely your choice how you will do it. Better algorithms? More efficient math? Better data layout? Multi-threading? SIMD? Rewrite in assembly? Rewrite as a compute shader / CUDA / OpenCL? Rewrite for NVIDIA RTX? All of these? You pick :) Some of what I listed here is "certainly overkill" and not needed; achieving a 10x faster performance is entirely possible using relatively simple means.

Go!

What I will be looking at

  • Overall I would recommend making a clone of this project on github and using "actual version control" workflow to make your changes. If you don't like git or version control, that's fine; I can accept submissions in zip or dropbox or google drive (or whatever) form. As long as I can see the code and try it out.
  • Your optimized program should produce same looking images as the original one, just do it faster.
  • I'll be looking at "everything" that is important in programming job: whether the code works correctly, is understandable, how is it structured, how it behaves performance wise (computing usage, memory usage etc.).
  • It is very likely that your first submission will not be quite good, so do not delay it until the last day! Usually it takes 2-4 iterations to arrive at a good solution.

About the code

I made it work on Windows (Visual Studio 2017) and Mac (Xcode 10); the project files for them are respectively in projects/VisualStudio/TrimeshTracer.sln and projects/Xcode/TrimeshTracer.xcodeproj. In Visual Studio project, you might need to update settings to whatever Windows SDK version you have, I picked the oldest I had on my machine. If you have any trouble building or running it, ask me!

The application accepts four command line arguments as input: <width> <height> <spp> <datafile>:

  • width is image width in pixels,
  • height is image height in pixels,
  • spp is "samples per pixel"; kind of like "anti-aliasing level",
  • datafile is path to a mesh to render; in Wavefront .OBJ format.

I added some example meshes under data/ folder; initially I suggest starting with e.g. data/cube.obj which is just a simple cube. I do not recommend trying to run the non-optimized version of the program on for example the Sponza model - it contains 66k triangles and will run very very slow.

The code itself is fairly simple C++ code, and I tried to comment it extensively. No previous experience with ray-tracers or path-tracers should be required.

If you do want to read up on what this "path tracing" thing is (and maybe even get some ideas how to make it faster? who knows), I can recommend "Ray Tracing in a Weekend", "Ray Tracing: The Next Week" and "Ray Tracing: the Rest of Your Life" minibook series, which have been recently made free here. The internet is full of other information about ray/path tracing as well.

For reference, here are the performance numbers I get on my laptop (2019 MacBookPro i9 2.9GHz), rendering at 640x360, 4 samples per pixel, on this un-optimized implementation:

  • cube.obj (14 triangles): 3818.3 KRay/s (0.6 sec)
  • suzanne.obj (970 triangles): 48.3 KRays/s (53.0 sec)
  • teapot.obj (15706 triangles): 3.5 KRays/s (683.0 sec)
  • sponza.obj (66452 triangles): LOL nope, ain't nobody got time for that