Ray Tracing is the Future and ever will be back

Board: Board index Raytracing Links & Papers

(L) [2013/07/29] [tby toxie] [Ray Tracing is the Future and ever will be] Wayback!

all stuff from siggraph is online by now:
[LINK https://sites.google.com/site/raytracingcourse/home]
(L) [2013/07/30] [tby Lo1] [Ray Tracing is the Future and ever will be] Wayback!

Thank you, just in time for me [SMILEY :-)]
(L) [2013/07/30] [tby Dade] [Ray Tracing is the Future and ever will be] Wayback!

>> [LINK https://sites.google.com/site/raytracingcourse/Improving%20Coherence%20for%20Path%20Tracing%20on%20GPUs.pptx?attredirects=0 Improving Coherence for Path Tracing]
Very interesting reading, it is just my impression or NVIDIA has done 180 turn: wasn't Optix kernel written has a single huge kernel before ? Not that I don't agree with the content of the presentation, I have written Path tracing kernels (mostly) in the same way for years.
Second question: doesn't the variable length of work queues they are using imply Dynamic Parallelism (as introduced by Kepler GPUs and recently in OpenCL 2.0) ? I use a fixed amount of work produced as workaround to this problem (i.e. I produce always only a single ray to trace at each step instead of a variable number).
(L) [2013/07/30] [tby spectral] [Ray Tracing is the Future and ever will be] Wayback!

I don't see it as a real change...
OptiX is not written as a big kernel... as far as I remember it contains some kind of stack that allow to cut the whole logic in several parts... even if it is "hided" to the developer.
In my sense, this new paper is a logical continuation to previous works...
1) They request to minimize (not too much !!) the kernels to avoid spilling, too much instructions caching ...
2) they provide a generic mechanism for all kind of operations, and also provide a way to keep coalescent accesses
3) instead of using sorting/scan/... they use a queue (fixed size), they use a __ballot instruction... that looks easy to implement even in OpenCL
I really like their job, it is simple, efficient and looks promising !
With this, it seems to have a nice architecture to extend to out-of-core and other kind of algorithms...
I just would like to try it in practice by myself [SMILEY ;-)] and also on other kind of architecture (Will CPU suffer of such processing, and on ATI ?)
(L) [2013/07/30] [tby graphicsMan] [Ray Tracing is the Future and ever will be] Wayback!

>> Dade wrote:[LINK https://sites.google.com/site/raytracingcourse/Improving%20Coherence%20for%20Path%20Tracing%20on%20GPUs.pptx?attredirects=0 Improving Coherence for Path Tracing]
Very interesting reading, it is just my impression or NVIDIA has done 180 turn: wasn't Optix kernel written has a single huge kernel before ? Not that I don't agree with the content of the presentation, I have written Path tracing kernels (mostly) in the same way for years.
Second question: doesn't the variable length of work queues they are using imply Dynamic Parallelism (as introduced by Kepler GPUs and recently in OpenCL 2.0) ? I use a fixed amount of work produced as workaround to this problem (i.e. I produce always only a single ray to trace at each step instead of a variable number).
I think that Optix does compile into a huge kernel, but my info could be out of date.
Also, you would think Dynamic Parallelism would be useful here, but apparently it's quite expensive in the Kepler arch.  These guys make their queue information available to the CPU and the CPU launches kernels.  Apparently this is slightly faster than launching via Dynamic Parallelism.
(L) [2013/08/06] [tby papaboo] [Ray Tracing is the Future and ever will be] Wayback!

I was doing some quick profiling of Optix 3.0 last friday and it does indeed seem to compile into a mega kernel, since all I could find where calls to trace_1. However, I'm currently only compiling for devices of compute capability 1.0 and on such old devices it makes sense to generate a giant kernel, since the performance gains by sorting by ray or material would be completely lost in the overhead of doing the actual sorting, launching kernels and fetching/storing data between kernel launches.
On newer devices, 2.X and upwards, it makes sense to actually break it down into smaller kernels to achieve better thread coherence by sorting based on materials and to keep the register count low. I would assume/hope that NVidia does that when compiling for newer architectures, as that should be perfectly doable with the way they've structured Optix.
(L) [2013/08/07] [tby friedlinguini] [Ray Tracing is the Future and ever will be] Wayback!

>> papaboo wrote:I was doing some quick profiling of Optix 3.0 last friday and it does indeed seem to compile into a mega kernel, since all I could find where calls to trace_1. However, I'm currently only compiling for devices of compute capability 1.0 and on such old devices it makes sense to generate a giant kernel, since the performance gains by sorting by ray or material would be completely lost in the overhead of doing the actual sorting, launching kernels and fetching/storing data between kernel launches.
On newer devices, 2.X and upwards, it makes sense to actually break it down into smaller kernels to achieve better thread coherence by sorting based on materials and to keep the register count low. I would assume/hope that NVidia does that when compiling for newer architectures, as that should be perfectly doable with the way they've structured Optix.
In one presentation at SIGGRAPH ([LINK http://on-demand.gputechconf.com/siggraph/2013/presentation/SG3106-Building-Ray-Tracing-Applications-OptiX.pdf]), NVIDIA implies that there is an upcoming version that might address this. One slide for enhancements in a future version of Optix cites performance enhancements in "ray tracing kernels [Aila and Laine 2009; Aila et al. 2012]". 2009 presumably refers to "Understanding the Efficiency of Ray Traversal on GPUs", but 2012 is unclear, since Timo Aila's home page only cites one paper in 2012, "Reconstructing the Indirect Light Field for Global Illumination", which doesn't really have anything to do with kernels. Since he's also one of three authors of "Megakernels Considered Harmful", there is room for hope.
Disclaimer: I work for NVIDIA, but not on Optix, so I have no relevant inside knowledge here.
(L) [2013/08/07] [tby Dade] [Ray Tracing is the Future and ever will be] Wayback!

>> friedlinguini wrote:In one presentation at SIGGRAPH ([LINK http://on-demand.gputechconf.com/siggraph/2013/presentation/SG3106-Building-Ray-Tracing-Applications-OptiX.pdf]), NVIDIA implies that there is an upcoming version that might address this. One slide for enhancements in a future version of Optix cites performance enhancements in "ray tracing kernels [Aila and Laine 2009; Aila et al. 2012]".
I bet OptiX is going to use this kind of "micro-kernel" solution. May be, it is just not yet available in the released version.
 
I'm still a bit worried of the overhead of the CPU <=> GPU round trip and atomics but it is such an elegant and clean solution, it must work well.
(L) [2013/08/07] [tby McAce] [Ray Tracing is the Future and ever will be] Wayback!

>> friedlinguini wrote:2009 presumably refers to "Understanding the Efficiency of Ray Traversal on GPUs", but 2012 is unclear, since Timo Aila's home page only cites one paper in 2012 ...
Scroll down to Technical Reports; it refers to Understanding the Efficiency of Ray Traversal on GPUs -- Kepler and Fermi Addendum: [LINK https://mediatech.aalto.fi/~timo/publications/aila2012hpg_techrep.pdf]
(L) [2013/08/07] [tby dbz] [Ray Tracing is the Future and ever will be] Wayback!

>> friedlinguini wrote:In one presentation at SIGGRAPH ([LINK http://on-demand.gputechconf.com/siggraph/2013/presentation/SG3106-Building-Ray-Tracing-Applications-OptiX.pdf]), NVIDIA implies that there is an upcoming version that might address this.
Slide 5 from that presentation called 'Real Time Path Tracing'  is interesting. It mentions a 35x speedup over a GTX 680 is needed before path tracing can be done in real time. With gpu speed increasing by say 50% each generation of 1.5 years, it would take about 13 years before path tracing can be done in real time on a gpu. However, that ignores increases in screen resolution(4K screens in the near future, maybe 16K screens at that time) and in scene complexity.

back