SAH BIH experiment

(L) [2006/08/15] [Phantom] [SAH BIH experiment] Wayback!

I did some tests with a BIH generated using SAH. The idea is as follows:

- Evaluate the left and right side of each primitive as a split plane candidate

- For each candidate, a cost is calculated like this:

cost = TRAVCOST + INTRCOST * (left_area * left_count + right_area * right_count) / area_of_parent

where

INTRCOST = 3 * TRAVCOST;

left_area is the surface area of the left box that would result from a split at this position;

right_area is the surface area of the right box (note that these boxes may overlap);

left_count is the number of primitives that would end up in the left box;

right_count is the number of primitives that would end up in the right box.

a 'bestcost' is initialized with

bestcost = INTRCOST * primitives

All of this is identical to kd/sah calculations, only interesting thing is the fact that BIH child boxes may overlap, or may have empty space inbetween. Both cases are projected in the calculated cost.

- Next, the two special cases for empty space cutoff (left and right) are evaluated, and bestcost is updated if this improves matters.

- Finally, bvh2 is considered.

Before I post some results I need to fix some issues, as Legocar actually got slower, and Sponza gets in an infinite loop. [SMILEY Sad] Scene6 did get a 31% boost though.

Ultimate goal is to determine how much of SAH can be done in a neglegible amount of time.

EDIT: First results are in.

Scene6 (800)

raw BIH:  7578kray/s

BIH/SAH:  11200kray/s (47.8% faster)

Legocar (10k)

raw BIH:  7232krays/s

BIH/SAH: 8644kray/s (19.5% faster, that's at 50% screen coverage)

Cloister (81k)

raw BIH:  3071krays/s

BIH/SAH:  doesn't compile..

More fixes needed. But the point is clear: Regular BIH could use some optimizations. Or: Modifying an existing BIH instead of building one from scratch let's you play (at least initially) at a considerable higher frame rate.
_________________
--------------------------------------------------------------

Whatever

(L) [2006/08/15] [Phantom] [SAH BIH experiment] Wayback!

BIH is great for some reasons: The hierarchy builds much faster than a kd/sah tree, and it uses far less memory (completely predictable, which is also nice). This makes it very suitable for ray tracing animated scenes, as it is fast enough to build the hierarchy *and* render it.

However, BIH is slower than kd/sah, for static scenes. Obviously, we would like the raw speed of kd/sah combined with the build speed of BIH.

So I wanted to know how much room for improvement there is in the BIH. Apparently, if build time is not an issue, the speed of kd/sah can almost be achieved. So now the question is: Is there a balance between build time and render time that is more optimal than the first versions of BIH?

My point with the BIH/SAH experiment is that there is considerable room for improvement. For that, I gave up the rapid build times completely.

So, it does not make sense to release any code, as it isn't good for anything. It's really just a testbed right now. I still intend to do a ray traced game in the coming months; this will use the fastest combination I can find and the source will be shared.
_________________
--------------------------------------------------------------

Whatever

(L) [2006/08/15] [fpsunflower] [SAH BIH experiment] Wayback!

Thanks for checking that out Phantom! It was definitely on my TODO list but its encouraging to see these good results on a first try. From what you posted it seems that SAH BIH catches up to SAH kdtrees. Can you give us your old kd numbers to compare?

I think the lower number of nodes, and perfect left/right splitting that BIH does (no overlap) should allow SAH to be massively more optimized that it can be for kd-trees (although certainly slower than toxie's heuristic). From a perfect SAH implementation it will be interesting to see what kind of hybrid tradeoffs can be made between the two techniques.

(L) [2006/08/16] [Phantom] [SAH BIH experiment] Wayback!

@fpsunflower:

I removed shadows and compared the performance to my 'speedking' edition (perftest.zip, available somewhere on this site). Perftest uses only a single core and reaches 7.0fps on my rig for scene6. Arauna (my current BIH testbed) reaches 8.3fps, but it scales almost perfectly over the 2 cores, so I must assume that using a single core, it would do 4.7fps which is clearly considerably lower than the kd/sah scores...

But that's just a single scene, and also the 'worst case' scenario for BIH (several polygons that follow the planes of the scene bbox). This scene used to perform extremely poor. I need to get my BIH/SAH to work for more representative scenes, like Sponza or KitchenTransformed. As soon as I get it to work, I'll post. At this point I doubt that BIH will ever beat kd for static scenes. That's not mandatory for me though, although I would like to get as close as possible.

- Jacco.
_________________
--------------------------------------------------------------

Whatever

(L) [2006/08/16] [toxie] [SAH BIH experiment] Wayback!

woah, this is really interesting!

can't wait to see some more numbers for bigger/complex scenes (like the bunny and buddah or the tbp-kitchen and bart-kitchen).

the speed-up for scene6 was a bit unexpected but if one thinks about it in detail it seems to be alright as the scene is so tiny and features these big wall-triangles (which the global heuristic should f*#k up after all). (the number of 11.2 Million rays per sec. still shocks me though ;)

i'm curious to see how other scenes will work out in direct comparison (so bring up them statistics, PLEASE ;))
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?

(L) [2006/08/16] [Phantom] [SAH BIH experiment] Wayback!

One interesting thing I just found while debugging:

SAH always forces a bvh2 node instead of a left/right empty space cut-off... In other words: a bvh2 node is always cheaper than a +/- infinity plane on either side, according to the SAH.

EDIT: Of course SAH prefers bvh2... A bvh2 node will have the same prim count, but always the same or smaller surface area.
_________________
--------------------------------------------------------------

Whatever

(L) [2006/08/16] [Phantom] [SAH BIH experiment] Wayback!

Clown

raw BIH: 3856

BIH/SAH: 4620 (19.8% speedup)

Using the default camera viewpoint (as you provided) the speedup is only 17.6%, which means that BIH/SAH doesn't help the black pixels, just the actually found intersection points (as I suspected).

It's also clear that SAH helps those poor scenes with tons of axis aligned large polygons much more than scanned models. I thus expect a large speedup for Sponza, and something like 20% for Buddha.
_________________
--------------------------------------------------------------

Whatever

(L) [2006/08/16] [Phantom] [SAH BIH experiment] Wayback!

tbp: I know, for that kind of scenes, kd does a great job. Perhaps even so great that it will outperform BIH if you have complex shading with tons of rays. However, my BIH compiler is not optimized, single core, not using bucket-like optimizations, and it already does legocar in 4ms.

What I am really curious about is how this iterative bih performs that was described in that hardware paper. If the static scenery is essentially free, even huge scenes with 50k or so moving triangles might be possible at interactive rates. There's no way kd/sah can beat that.

Nevertheless, your 5 second builds of Buddha are completely amazing! I do wonder though what you could do to a BIH compiler... Current attempts completely lack the level of sophistication you achieved in your kd tree compiler.

New figures:

Sponza (74k triangles)

raw BIH:  2421kray/s

BIH/SAH:  2875kray/s (18.6% improvement)

optimized scene (124k triangles):  2946kray/s

optimized scene with SAH:  3771kray/s (28% improvement, or 55.8% over the original set and raw BIH)

Cloister (81k triangles)

raw BIH:  3133kray/s

BIH/SAH:  4662kray/s (48.8% improvement)

So for Sponza, that's not entirely what I expected. Unless I made an error in my SAH computations, BIH can't beat kd/sah for Sponza, not by a long shot. It's interesting to see that SAH actually does a better job if I first split large triangles: The combined gain of splitting and SAH is quite spectacular.

Cloister benefits big time though. Didn't try what pre-splitting helps.

By the way the BIH/SAH builder was working all along, it just takes an insane amount of time to build the scenes. [SMILEY Smile]
_________________
--------------------------------------------------------------

Whatever

(L) [2006/08/16] [tbp] [SAH BIH experiment] Wayback!

Well, toxie wasn't specific about the stats he wanted and i made wild claims & much hand waving in another thread (bouncy sphere demo) where i've speculated about legocar's total time to render... What a relief [SMILEY Wink]

So beside the pun & face saving aspect, you're most definitely right: that's not the place for SAH/kd-tree. I mean, BIHfication is faster than radix-sorting the damn thing.

The buddha scene is a poor metric, the tree is mostly balanced, there's no geometrical or precision issues etc... I wish we had interesting not-scanned semi-large data sets. Snif.

I say semi-large because even if i once thought about garbage-collecting stuff along the way, it would be a major pain in the behindment to reconcile that and parallelization. Re-snif.

I'm a bit surprised by the relative slowliness of a SAH/BIH builder, when compared to what i remember of my pathetic SAH/BVH builder; but then what do i know about BIH eh. Soon i'll have no excuse left for not trying i guess [SMILEY Smile]

And it's not like those in-place, refinable, out-of-core, faster-than-sort properties aren't sexy in their own right.

PS: i still need to tweak & clean the code, but i don't expect Buddha to be affected.
_________________
May you live in interesting times.

[LINK https://gna.org/projects/radius/ radius] | [LINK http://ompf.org/ ompf] | [LINK http://ompf.org/wiki/ WompfKi]

(L) [2006/08/17] [toxie] [SAH BIH experiment] Wayback!

so far the results weren't too shocking. ;)

except for the two scenes (cloister & scene6) where the speedup is really worth doing SAH, the other results are somehow in the range that i expected.

so now the question is: how can we extend the global heuristic -easily- to have some speedup for evil scenes like scene6/cloister?!

(remember that -always- the easy algorithms have survived over time, not the complicated ones! :)
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?

(L) [2006/08/17] [Phantom] [SAH BIH experiment] Wayback!

Perhaps SAH can be used to find the best option among these:

- The split candidate proposed by the global heuristic from the BIH paper;

- A median split along the longest axis;

- Empty space cut-off along the longest axis;

- A bvh2 node.

This can be implemented rather cheap I suppose; the most expensive part is the left/right count calculation (just like SAH) but this can be added to the 'partition' stage of Quicksort, and it can easily be done for several candidate positions in the same loop.

Problem is that I noticed that SAH is very sensitive to approximations. Doing SAH over the longest axis only completely spoils the fun (makes it slower than the global heuristic) so I suspect that doing SAH over a very limited set of candidates will be even worse.
_________________
--------------------------------------------------------------

Whatever

(L) [2006/08/17] [Michael77] [SAH BIH experiment] Wayback!

Just a thought: Maybe adding some sort of penalty for overlapping cells will help, because these are likely to be evil, especially in scenes with large and with small triangles. I just hat such a case where a single large triangle boosted up my number of traversal steps from 60 to 265, just because the cells were overlapping too much so I found an intersection far behind the nearest intersection first.

(L) [2006/08/17] [Phantom] [SAH BIH experiment] Wayback!

I agreee, but the problem is finding an alternative. Using Toxie's original heuristic, there's only one candidate, so you don't have to walk a list of primitives to calculate it's cost. Since partitioning is by far the most expensive part of BIH construction, doing this multiple times to determine a better candidate will always hurt performance badly.
_________________
--------------------------------------------------------------

Whatever

(L) [2006/08/17] [funky] [SAH BIH experiment] Wayback!

One idea I'm thinking of: Sorting the triangles in a (one-dimensional) regular grid (10 cells?) to find emty space. If we don't find emty space (emty cells) then we cut in the middle (only the Triangles overlapping the middle cells have to be sorted again). If we find one or more emty cells, then the triangles are already sorted (and we cut there).

(L) [2006/08/17] [Phantom] [SAH BIH experiment] Wayback!

I just found out that writing a simple global variable (Engine::m_RaysCast, a statistics counter) from a thread reduces overall rendering performance by no less than 8%... Removed that, and now Scene6 reaches 12.002M ray/s on my box. [SMILEY Smile]

Also, I had this pluecker vs barycentric issue on my todo list, which I sorted out today. And guess what? When I equip my ray tracer with an ugly hacked together plucker test implementation, the tracer runs 10% faster than using an optimized-to-the-metal barycentric test, while at the same time saving 48 bytes per triangle and reducing build time for dynamic geometry. Add to that that the barycentric test is less numerically stable, and we have a clear winner. I don't see why anyone should use the barycentric test at this point.
_________________
--------------------------------------------------------------

Whatever

SAH BIH experiment back