Ray/tri: half-frustum early out back

(L) [2007/09/30] [Phantom] [Ray/tri: half-frustum early out] Wayback!

Here's a little idea I had this morning, based on Xela's paper, which tries to prevent ray/tri tests for many rays.


For primary and shadow rays, Ingo Wald's BVH traversal approach traverses a large volume of rays (in my case: 16x16). These rays are at some point intersected with leaf triangles. To speed this up, each triangle is first tested against the corner rays: If any triangle edge has all corner rays on one side, this triangle doesn't need to be tested. If just a single ray may hit the triangle, all rays will be tested against this triangle.


So, suppose the volume is subdivided before intersection takes place. Besides the four bounding planes, there is one extra plane (for starters) that splits the frustum in two equal halves. In my case, this means that the first 16x8 rays are on one side, the second 16x8 on the other side. Now suppose all three vertices of a triangle lie on one side of this plane. That means that half of the rays won't have to be tested against this triangle. Same for the other side.


With just this basic setup (one extra plane) I observed a few interesting things: First of all, speed goes up: Sponza rendered with the basic shading model (no textures) is 3.9% faster. Since this has to be purely the intersection, this is a significant gain, especially if you would like to have many more rays in your packet later (for anti-aliasing). Two other quick observations: The relative speed improvement is even better when there are more triangles in the leaves (which matches Xela's observations). In other words: The code is less sensitive to trees with less depth. And finally: The improvement is more pronounced when the view is 'complex', which is logical.


Next step is obviously further subdivision (at least two planes, to cut the frustum in four). And, my implementation is a quick hack now. But still: Interesting stuff, and very easy to add to a BVH ray tracer.


Figures:


Sponza, million rays per second

6.730 (halfplane, 5pl) vs 6.480 (no halfplane, 5pl), 3.9% faster

6.776 (halfplane, 2pl) vs 6.578 (no halfplane, 2pl), 3.0% faster


'5pl' is '5 primitives per leaf'.


- Jacco.
_________________
--------------------------------------------------------------

Whatever
(L) [2007/10/02] [ingenious] [Ray/tri: half-frustum early out] Wayback!

That is interesting [SMILEY Smile]  And logical. I think in future we will be intersecting two hierarchies against each other...
(L) [2007/10/02] [Phantom] [Ray/tri: half-frustum early out] Wayback!

Well in the end you may end up with a beamtree for leaf nodes. This is something I have been playing with (just in my mind though): Whenever you encounter a leaf, you could build a beamtree to the 'common origin' (assuming there is one, and that's a big assumption), based on the triangle edges. Rays would then be classified using the beamtree. The tree has many advantages:

* Each ray is checked against the tree, not full triangles;

* Each edge is considered only once, not twice as virtually all other methods do;

* Full overlap of a prim and a (sub)beam can easily be detected.

The beamtree needs to be generated each time a leaf node is visited, but this can be cached (since nearby rays will also visit the same leaf). For secondary rays, it's nice to know that planes to the leaf boundaries quickly reject rays that miss the leaf. These rays will not be tested against any of the triangles in the leaf.


Well I'm still not sure if this is worth it, but given an optimal balance between tree depth (i.e. leaf size), beamtree size and packet size, this might actually work.


But for now I'll just try that second plane and be done with it. [SMILEY Smile] Today I didn't work on it, I tried to convert my internal 128bit floating point texture format to 32bit, with on the fly conversion to 128bit... Tricky to do fast.
_________________
--------------------------------------------------------------

Whatever

back