Re: Speed Issues with GPU BVH rendering back
Board: Board index ‹ Raytracing ‹ General Development ‹ GPU
(L) [2013/08/23] [graphicsMan] [Re: Speed Issues with GPU BVH rendering] Wayback!Is that 300 Mrays/s with the cheap BVH he's using?
(L) [2013/08/23] [jbikker] [Re: Speed Issues with GPU BVH rendering] Wayback!Ehm no, good point. I believe HLBVH gets about 50% of the performance of SBVH, so he should expect to be able to do 150M.
(L) [2013/08/23] [ziu] [Re: Speed Issues with GPU BVH rendering] Wayback!>> jbikker wrote:I'm afraid you're much further behind atm than 5x.  On a single 670, it's quite possible to do 300Mrays/s for a scene like that and shading like that.
Sure. But I am talking about ray tracing without using bundles. Or specialized triangles. Or ray reordering. Or persistent threads. Not to mention that this SAH is suboptimal. NIH itself comes with a GPU BVH Binned Grid SAH method from Garanzha which is at least twice as fast as LBVH on rendering speed from my tests. It just takes 10x more time during construction. Plus you have more GPU memory bandwidth in your 670 and that does count for something with caches so small.
As for "with shading like that" I am computing the barycentric coordinates of the triangles. I just am not using them on anything yet.
 >> jbikker wrote:Ehm no, good point. I believe HLBVH gets about 50% of the performance of SBVH, so he should expect to be able to do 150M.
Surely you jest:
[LINK https://research.nvidia.com/sites/default/files/publications/karras2013hpg_aux.pdf]
They get 161.4 Mrays/s on a GeForce GTX Titan in the Buddha using LBVH. These are published results from HPG 2013 like early this month. That is a GK110 chip with like twice the shader modules units (14 vs 7) and hence twice the FLOPS of either our cards. To expect that on a GeForce GTX 660 Ti is unreasonable. It will be less than half that. Plus they are using the Timo Aila rendering loop and I'm not.
The other acceleration structure implementations I am comparing this with do not do either of those tricks either so "to be fair" I'm not going to do them with this BVH implementation. Not at this point at least.
(L) [2013/08/23] [jbikker] [Re: Speed Issues with GPU BVH rendering] Wayback!It's hard to compare these things. Those results are from their Wavefront paper, and their code is designed to work with complex shaders. The number I was mentioning was for primary rays with very basic shading, since your shot is visualizing only traversal depth. Other things to take into consideration: the screen area occupied by your object (~30% in your shot?), and whether or not you take 'overhead' outside the traversal kernels into account. The Aila & Laine paper I mentioned earlier was just measuring kernel time, for instance, outperforming any practical ray tracer by 100% in practice.
I'm just trying to give you a realistic target; when I started, Ingo Wald had figures that outperformed my tracer by an order of magnitude (that was on the CPU, with packet traversal), and this proved to be a great motivator. Once I surpassed him, tbp (who set up the original ompf, amonst other great deeds) provided figures that always slightly exceeded mine. I need that. [SMILEY :)]
(L) [2013/08/23] [ziu] [Re: Speed Issues with GPU BVH rendering] Wayback!>> jbikker wrote:It's hard to compare these things. Those results are from their Wavefront paper, and their code is designed to work with complex shaders. The number I was mentioning was for primary rays with very basic shading, since your shot is visualizing only traversal depth. Other things to take into consideration: the screen area occupied by your object (~30% in your shot?), and whether or not you take 'overhead' outside the traversal kernels into account. The Aila & Laine paper I mentioned earlier was just measuring kernel time, for instance, outperforming any practical ray tracer by 100% in practice.
I'm just trying to give you a realistic target; when I started, Ingo Wald had figures that outperformed my tracer by an order of magnitude (that was on the CPU, with packet traversal), and this proved to be a great motivator. Once I surpassed him, tbp (who set up the original ompf, amonst other great deeds) provided figures that always slightly exceeded mine. I need that.
The acceleration structure I am designing takes 16.91 ms construction time on Buddha and does 76.30 Mrays/s shaded and lit on that same scene on my system using no ray optimizations whatsoever. For primary rays you can precompute a lot of things and save time that way. You can also exploit ray coherence. I am not doing any of that (yet).
This mucking around with the LBVH is part of my comparison with other state of the art GPU animation algorithms with fast rebuilds. NIH has state of the art code for 2011-2012. AFAIK more recent work has no source code available at all. I just wanted to ensure I was exploiting its performance in a proper fashion without errors from my part.
It may get to the point where I will use this RT technology in a game engine in the future. When that happens I will necessarily have to do those special cased optimizations.
FWIW the numbers on the original Lauterbach GPU BVH paper which started the ball rolling on this back in 2009 are quite humble rendering performance numbers.
[LINK https://wwwx.cs.unc.edu/~geom/papers/documents/articles/2009/lauterbach09.pdf]
It is historically important not only because it shows you can do per frame BVH rebuilds on the GPU but also because of the construction technique involved which is, in my opinion, quite fascinating.
back