(L) [2013/11/21] [tby voxelium] [Stackless MBVH Traversal for CPU, MIC and GPU Ray Tracing] Wayback!Stackless Multi-BVH Traversal for CPU, MIC and GPU Ray Tracing
Attila T. Áfra and László Szirmay-Kalos
Computer Graphics Forum (2013)
[LINK http://voxelium.wordpress.com/2013/11/21/stackless-multi-bvh-traversal-for-cpu-mic-and-gpu-ray-tracing/]
Personal copy: [LINK http://cg.iit.bme.hu/~afra/publications/afra2013cgf_mbvhsl.pdf]
Definitive version: [LINK http://dx.doi.org/10.1111/cgf.12259] (currently free)
Abstract:
Stackless traversal algorithms for ray tracing acceleration structures require significantly less storage per ray than ordinary stack-based ones. This advantage is important for massively parallel rendering methods, where there are many rays in flight. On SIMD architectures, a commonly used acceleration structure is the multi bounding volume hierarchy (MBVH), which has multiple bounding boxes per node for improved parallelism. It scales to branching factors higher than two, for which, however, only stack-based traversal methods have been proposed so far.
In this paper, we introduce a novel stackless traversal algorithm for MBVHs with up to 4-way branching. Our approach replaces the stack with a small bitmask, supports dynamic ordered traversal, and has a low computation overhead. We also present efficient implementation techniques for recent CPU, MIC (Intel Xeon Phi), and GPU (NVIDIA Kepler) architectures.
Edit: added links to personal copy and definitive version.
(L) [2013/11/22] [tby toxie] [Stackless MBVH Traversal for CPU, MIC and GPU Ray Tracing] Wayback!This is definetly some of the better papers on traversal the last years!
Good read and some interesting (maybe not super-ground breaking, but still) ideas..
(L) [2013/11/22] [tby Dade] [Stackless MBVH Traversal for CPU, MIC and GPU Ray Tracing] Wayback!It is a shame AMD has dropped VLIW architecture, QBVH (aka MBVH) was very effective on that kind of GPUs too. Now, I guess, this kind of research is mostly useful for CPUs and Xeon Phi.
(L) [2014/01/20] [tby cessen] [Stackless MBVH Traversal for CPU, MIC and GPU Ray Tracing] Wayback!Just wanted to note that I implemented the algorithms from this paper in Psychopath, and it's been a huge benefit.  Specifically, I was previously limited to a 2-arity BVH before due to Psychopath's use of ray reordering.  Using the algorithm in this paper for 4-arity BVH traversal, I was able to improve BVH performance by over 50%.  And even using the 2-arity algorithm significantly simplified my code, and provided a nice bump in performance compared to the algorithm I was previous using.
Thanks for the paper!
(L) [2014/01/21] [tby jbikker] [Stackless MBVH Traversal for CPU, MIC and GPU Ray Tracing] Wayback!On CPU, the fastest practical approach is straight MBVH traversal. For first bounce diffuse rays, Tsakok's MBVH/RS is optimal. I presented a paper with an approach that outperforms both (by ~20%), but it's a complex algorithm:
[LINK http://arauna2.ompf2.com/files/cgf_article.pdf]