BIH / BkdTrees ... are they really what we want ?

(L) [2006/06/26] [Shadow007] [BIH / BkdTrees ... are they really what we want ?] Wayback!

I've read (not experienced) the two papers, and I ask myself a question :

These two data-structures help tremendously with the real-time structure construction, but what effect do they have on FULL raytracing with all these pretty shades/ recursive ray-traces ?

It seems to ma a trade-off between speed and quality, is it not ?

I know I'm in the General Development forum (not the Not Quite Realtime one), but is there some kind of measurement (a number of rays or something), that tells at which point it's better to build fast and trace "slowly", and at which point it would be better to build "slowly" and trace "fast" ?

Not wanting to disminish the work of people who are far wiser/experienced/better than me in this, but that's just a question ...

(L) [2006/06/26] [Phantom] [BIH / BkdTrees ... are they really what we want ?] Wayback!

So far BIH gives me trees that render at 65% of the performance of kd/sah, so it's not that bad. Toxie on the other hand reports almost identical performance. We're looking into this difference. In the meantime, compiling Buddha in less than a second is awesome.
_________________
--------------------------------------------------------------

Whatever

(L) [2006/06/26] [Shadow007] [BIH / BkdTrees ... are they really what we want ?] Wayback!

If the ray-trace perfs are equivalent ... I should'nt have posted [SMILEY Sad]

(L) [2006/06/27] [Phantom] [BIH / BkdTrees ... are they really what we want ?] Wayback!

That would mess up packet traversal...

I am considering to try this BIH/SAH approach with incremental updates. Perhaps I could even turn a well-built kd/sah into a BIH?

On a related note: Would it be possible to (ab)use some bits of the mantissa of a floating point value to store some custom bits?
_________________
--------------------------------------------------------------

Whatever

(L) [2006/06/27] [toxie] [BIH / BkdTrees ... are they really what we want ?] Wayback!

I can only report my own results on the "random" ray topic: There is this Audi A6 in the BIH paper and this uses only single rays and pretty advanced shaders, so not much coherence there. Nevertheless the render speed is almost exactly the same as in our kD.

On the other hand, phantoms kD speed is a bit faster than ours, so maybe this is why he still experiences a 30% slower BIH traversal?!
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?

(L) [2006/06/27] [Phantom] [BIH / BkdTrees ... are they really what we want ?] Wayback!

I'm sorry, but it can't be just my more efficient kd-tree traversal. First, Toxie optimized his traversal also (although I have 4x4 packets), but more importantly: Tree sizes and number of intersections don't match as far as I can see. Although for Buddha it all seems reasonable. Perhaps I'll whip up a 2x2 traversal today and post the traversal loop so you guys can compare.
_________________
--------------------------------------------------------------

Whatever

(L) [2006/06/27] [Phantom] [BIH / BkdTrees ... are they really what we want ?] Wayback!

I just noticed (I mean, read it in Toxie's paper) that the'global heuristic' can also be applied to the construction of kd-trees. That might help to check where the problem lies. If possible, I'll change my kd-tree compiler to use this heuristic, just to see how the trees compare to sah.
_________________
--------------------------------------------------------------

Whatever

(L) [2006/06/27] [playmesumch00ns] [BIH / BkdTrees ... are they really what we want ?] Wayback!

PBRT stores split-axis (I think) info in the lowest two bits of the mantissa of the split position.

This does cause artifacts in some cases - rays ignoring triangles when they should hit.

(L) [2006/06/27] [Phantom] [BIH / BkdTrees ... are they really what we want ?] Wayback!

Never mind, I wanted to store much more. The BVH/KD paper mentioned nodes with 4 floats, a split axis and a pointer to a child node. Besides the 4 floats I would thus need about 20 bits. I guess there's no reasonable way to store those nodes, without dedicated hardware.
_________________
--------------------------------------------------------------

Whatever

(L) [2006/07/07] [toxie] [BIH / BkdTrees ... are they really what we want ?] Wayback!

P4 2.8 GHz (no HT used)
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?

(L) [2006/07/07] [tbp] [BIH / BkdTrees ... are they really what we want ?] Wayback!

Then i really can't say i'm impressed.
_________________
May you live in interesting times.

[LINK https://gna.org/projects/radius/ radius] | [LINK http://ompf.org/ ompf] | [LINK http://ompf.org/wiki/ WompfKi]

(L) [2006/07/07] [toxie] [BIH / BkdTrees ... are they really what we want ?] Wayback!

No one said that my code is the best optimized of the world. :)

I just provided it to see that it can compete with the kD for random rays.

What are you numbers for the bunny (kD)?
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?

(L) [2006/07/07] [tbp] [BIH / BkdTrees ... are they really what we want ?] Wayback!

"Thierry Berger-Perrin reports preliminary results of 1.67M rays/sec on a 2GHz Opteron, using bounded volume hierarchies (BVH)."

That was many moons ago with a fat BVH and my first try at SAH, if i remember correctly my kd was back then at more than 2M+/s but nick never posted that result.

Anyway, i'm kinda surprised that with hardware in the same league and a nimbler but somewhat similar structure you don't beat silly that crummy result of mine.
_________________
May you live in interesting times.

[LINK https://gna.org/projects/radius/ radius] | [LINK http://ompf.org/ ompf] | [LINK http://ompf.org/wiki/ WompfKi]

(L) [2006/07/07] [toxie] [BIH / BkdTrees ... are they really what we want ?] Wayback!

2M+ is impressive indeed! But isn't too unexpected for me. SAH was designed for raw traversal speed, so...

Also you have to understand that both my versions (kD and BIH) are NOT optimized to the max like yours & phantom's

are, as the ray tracer is designed to run in a production environment, so it has to function under all circumstances (=f*#ked up data, rays, etc.) and also has to avoid epsilons at any price.

Nevertheless: Newest number for the BIH (including the "trick" posted in the Links&Papers-Thread): 1,73 Mill.Rays/sec
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?

(L) [2006/07/07] [tbp] [BIH / BkdTrees ... are they really what we want ?] Wayback!

Err, i was comparing the BVH & BIH results because of similarities and the fact that you can't screw up as much on SAH with a BVH as with a kd-tree (again those results are like real old and i was quite clueless about the whole SAH deal).

Really i would have expected BIH to shine if only for it's tighter footprint compared to my bloated old-school bvh with no early exit.
_________________
May you live in interesting times.

[LINK https://gna.org/projects/radius/ radius] | [LINK http://ompf.org/ ompf] | [LINK http://ompf.org/wiki/ WompfKi]

(L) [2006/07/07] [toxie] [BIH / BkdTrees ... are they really what we want ?] Wayback!

Hmmm.. .. You're right. Cause using random rays the memory footprint should make a difference.

Especially as the bunny does not fit into L2-cache.

On the other hand, it might not matter at all for such a case, as the P4 loads 64 (actually 128) bytes (Athlon: 64b if i remember correctly)

at a time and if the part of the model is not in cache, this is becoming the bottleneck for everything. So it shouldn't matter if a fat BBox has to be loaded

or a slim kD/BIH node. Both should need equal numbers of cycles (memory transfer of 1(2) cachelines).
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?

(L) [2006/07/07] [tbp] [BIH / BkdTrees ... are they really what we want ?] Wayback!

Hmmk, that's kinda plausible... even if i think the cache-line size is 32 bytes for P4 - now i'm not quite sure it's true on all revisions; k7 & k8 always had a 64 bytes cache-line. So i waste a quarter of cache line loading an aabb on my k8 and you waste some loading your couple of planes.

But i still had to load more, and my bvh didn't partition at all [SMILEY Smile]

edit: Ok, seems that i'm a bit full of it and/or ignorant about whatever global warmer Intel produce nowdays; your cache (re)load argument holds.
_________________
May you live in interesting times.

[LINK https://gna.org/projects/radius/ radius] | [LINK http://ompf.org/ ompf] | [LINK http://ompf.org/wiki/ WompfKi]

(L) [2006/07/07] [toxie] [BIH / BkdTrees ... are they really what we want ?] Wayback!

32 bytes was PIII i think.

64 bytes is P4. But as there is some design-issue, it always loads TWO cachelines in sequel (so overall 128 bytes).

Pentium M is different i believe.
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?

(L) [2006/07/07] [tbp] [BIH / BkdTrees ... are they really what we want ?] Wayback!

I'm pretty sure at least some P4 had a 32byte cache line; anyway as

a) i never took note of the load-them-by-2 feature

b) all not-so recent P4 have gone with 64 bytes cache lines

... thusly my point was moot to say the least.

Add on top of that, given that K8 have a much better latency at cache refilling than P4, the handicap is clearly on you - even if that's a bit counter intuitive of a result [SMILEY Wink]
_________________
May you live in interesting times.

[LINK https://gna.org/projects/radius/ radius] | [LINK http://ompf.org/ ompf] | [LINK http://ompf.org/wiki/ WompfKi]

BIH / BkdTrees ... are they really what we want ? back