bih / bkd implementations back

(L) [2006/06/23] [fpsunflower] [bih / bkd implementations] Wayback!

Starting a new thread for those of us trying out these new hybrid schemes.



My initial implementation of the BIH is a bit dissapointing, half the speed of my SAH kdtree at best (primary ray only rendering time). But the build time _is_ dramatically faster (that was the whole point wasn't it [SMILEY Wink]). Only spent an evening on it so far though so plenty of room for improvement.



At this point, most of my questions revolve around the build routine. toxie, feel free to enlighten us if possible.


* What are the termination criteria for the recursive build? All my scenes seem to have at least a handfull of leaves that go all the way down to max-depth, even though they only have like 2 or 3 prims in them (which I assume are not seperable - haven't debugged graphically yet). How do you detect these cases efficiently? The limit case is N copies of the same triangle - right now the build routine I have can't do anything good with that case. Should the code just try to backtrack these useless "splits"?


* What do you set the clip value to when the corresponding side is empty ? Right now I use +/- Inf which seems to work. Is that robust during traversal though? (NaN safe traversal round 2)


* Not storing empty leaves. This assumes your traversal is perfectly robust (see point above). I assume you just trick the children pointer when the empty node ends up on the left side? ie: store index-3 so you end up indexing into the right place without storing one of the children?


* Storing compressed leaves (2x4 bytes vs. 3x4 bytes for inner nodes). Same questions as above really. When there is a mismatch of types (a leaf being sibling of a node for example) doesn't this pose a problem for indexing during traversal ? Or do you only compress leaf nodes when they won't impact their sibling ?


* Any suggestions for correctly multithreading the on-demand build? Did you use OpenMP ? How close can you stay to 100% cpu usage when all procs are competing for a write lock to such a central data structure?



And a more open question:


The BKD paper is indeed the same thing as BIH with full intervals (mentioned in the BIH paper). They do an SAH build though and get dynamic speed from incremental updates. Hardware considerations aside, what are you guys' thoughts on the two methods ? Surely doing SAH for these schemes is going to be much faster than the clipping kdtree compilers we've done. If it can be optimized enough, could SAH become worthwhile ? or is the heuristic close enough to optimal that the difference will be minor ?



Well thats all for now. Chances are I'll be able to answer some of these myself after spending a bit more time thinking about them, but I figured we could start a discussion anyway.


These papers have opened up some great new possibilities =)
(L) [2006/06/23] [Phantom] [bih / bkd implementations] Wayback!

Are we allowed to discuss the BIH paper? I thought it was private and secret.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/06/23] [tbp] [bih / bkd implementations] Wayback!

Please stay calm and no harm will happen.

I've plonked that thread under the carpet and blessed fpsunflower's account until NDA issues are resolved.


I really don't like to have to do that.  Toxie, what's the current status?
(L) [2006/06/23] [toxie] [bih / bkd implementations] Wayback!

From monday on it's okay to talk about it in public. ;)

But unfortunately it's still not okay to put the paper on a website. :(


Actually i don't know why the other guys are putting their stuff out

(http://myweb.hinet.net/home7/hks/Papers2006/egsr2006Papers.htm)

but we had to sign a paper that basically says "your not allowed to put that paper

on your website for a year, only EG is allowed to".

Spreading it via mail to "colleagues, etc." is allowed though.


Nevertheless we (Alex and me) just discussed it and the paper will be available on our site

(http://graphics.uni-ulm.de) from monday on (or maybe tuesday).


btw: The closed discussion here is of course okay for now, as i've sent the paper to all you guys

to have fun with. ;)
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/06/23] [toxie] [bih / bkd implementations] Wayback!

Now for the more fun stuff:
(L) [2006/06/23] [Phantom] [bih / bkd implementations] Wayback!

Ok then I'll address one of fpsunflower's questions:


You need to add a check to see if a potential split is valid. This test reads something like:


if ((left_maximum != right_minimum) || (left_count && right_count)) ALL_OK; else SKIP_CANDIDATE;


This got me from 50% performance (bih vs. kd/sah) to 60% performance.


Maybe you are running into the same problems as I did: Can you verify what kinds of node counts and other stats you get for Sponza and Buddha? Sponza is problematic for me, as it has many axis-aligned triangles. Buddha is better, but even that one doesn't come close to my kd/sah implementation. Although the insane build times, even with a naive hierarchy builder, are certainly nice. [SMILEY Wink]


Also, I had some problems grasping the concept of keeping track of two bbox'es during construction. Brick state. Took some explanation from Toxie to sort that out.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/06/23] [toxie] [bih / bkd implementations] Wayback!

Oh and another thing that just popped to my mind:


I've got an additional explanation why you guys don't reach our performance (yet)!

Could it be that the triangle test is slower than ours? Cause i know that most of

you don't use Badouel's/Wald's test anymore, but Pluecker (or something).

This may be okay for the kD, as the number of triangles to test with is rather small.

With the BIH (or any kind of BVH) the triangle testing gets more dominant due to overlaps

and larger node-volumes. My triangle-test is optimized to the max (even quiet some

bit faster then Badouel's/Wald's original version) so maybe this is also a "problem"?!
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/06/23] [Phantom] [bih / bkd implementations] Wayback!

Not in my case. I simply get (far) more nodes than you report, or, when I increase the mimimum prims per leaf count, I get (far) more ray/tri intersections. The reason I use pluecker at the moment is that this test is cheaper than Wald's when the ray misses a triangle. Besides, it's more stable, numerically. You can check out the details in Carsten Benthin's thesis on the OpenRT ray tracer:

[LINK http://graphics.cs.uni-sb.de/~benthin/final.pdf]
_________________
--------------------------------------------------------------

Whatever
(L) [2006/06/23] [tbp] [bih / bkd implementations] Wayback!

Another administrative note:

Hmmk, toxie, it would be simpler for me if you provided a list of people that already have da paper so i can grant them access to that section thusly so i won't have to move threads around [SMILEY Smile]
(L) [2006/06/23] [toxie] [bih / bkd implementations] Wayback!

tbp: greenhybrid, Ho Ho, playmesumch00ns, fpsunflower, Shadow007 all mailed me after someone posted about the BIH paper and they all received it by now.
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/06/23] [toxie] [bih / bkd implementations] Wayback!

phantom: Of course i've read Benthin's thesis, but all my experiments with different tri-intersection-algo's

and my own variant ALWAYS were slower with the kD. I even wonder why one would want to include Pluecker

into the kD as most of the time a ray HITS the triangle in a leaf. In a BVH it's more the opposite. Chances are

good that you'll miss the triangle or the hitpoint is behind an already found one!

So all i want to say: Maybe give the old barycentric test a try if it is faster for a BVH?!  :)
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/06/23] [toxie] [bih / bkd implementations] Wayback!

sorry.. yet another post..


tbp: and lycium also.
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/06/23] [Phantom] [bih / bkd implementations] Wayback!

Here's some pseudo code, fpsunflower can you compare, Toxie can you verify?
(L) [2006/06/23] [toxie] [bih / bkd implementations] Wayback!

Looks wonderful!

(Actually looks far better than mine ;)

So i don't get it why this produces much more nodes than my code. :(
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/06/23] [Phantom] [bih / bkd implementations] Wayback!

O great.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/06/23] [tbp] [bih / bkd implementations] Wayback!

In my eyes the winning argument for Pluecker is behaviour wrt aliasing.


PS: graced them all.
(L) [2006/06/23] [Phantom] [bih / bkd implementations] Wayback!

I got a small fix that improves matters:


Add these tests:


If (leftcount == 0) && (minright == nodebox.min) retry with right half of scenebox

If (rightcount == 0) && (maxleft == nodebox.max) retry with left half of scenebox


Or in other words: Using these tests, one could discard a split plane candidate after partitioning.


Here's my current full code:
(L) [2006/06/23] [Phantom] [bih / bkd implementations] Wayback!

fpsunflower:


If you can compile Sponza, could you give me some stats to compare?


I'm subdividing if the prim count is 3 or more; max depth is 100 (although with the just proposed tests I never reach that anymore, thankfully).

For Sponza this gives me a very high number of ray/tri intersections (like 45 or more). Node counts seem reasonable now: 129665 total nodes, 64833 leafs, 64832 interior nodes, 26480 empty nodes (that's for 76155 prims).


By the way, I'm starting to suspect that this whole issue has something to do with the fact that Toxie is holding on to 'volume elements' up to a certain depth (i.e., as long as a tree node contains more than a single grid voxel). By treating voxel elements as a whole, splitting grid elements is discouraged. This might be good, as the scene box matches the grid due to the way it is split in two each time. I skipped implementing this part of the paper, as a per-primitive approach seemed simpler to start with. But as far as I can see, this is the only major difference with Toxie's approach right now. I fail to see how this could explain my 45 ints/tri in Sponza vs. Toxie's 12-13... Int/tri for Buddha seems OK, so it's gotta have something to do with all those axis aligned polies.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/06/24] [Phantom] [bih / bkd implementations] Wayback!

If I remove the test for split candidates that are outside the node bbox, Buddha runs at 75% of kd/sah... So for me, that's an improvement. Also, tree size doesn't grow tremendously that way. Sponza is still pure crap though.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/06/24] [greenhybrid] [bih / bkd implementations] Wayback!

What's this all about? Did I sleep to long? Someone teared out the time?

Tell me tell me tell me please!
_________________
[LINK http://greenhybrid.net/] | [LINK mailto:root@greenhybrid.net root@greenhybrid.net]
(L) [2006/06/24] [Phantom] [bih / bkd implementations] Wayback!

If you're here, you must have read the papers. Well so did we, and we have working implementations. They just need some... tuning. [SMILEY Smile]
_________________
--------------------------------------------------------------

Whatever
(L) [2006/06/26] [fpsunflower] [bih / bkd implementations] Wayback!

Just got my subversion repository going. The relevant file is:


[LINK http://svn.sourceforge.net/viewcvs.cgi/sunflow/sunflow/src/org/sunflow/core/accel/BoundingIntervalHierarchy.java?view=log]


Its a bit obfuscated by the use of Java, but I tried to put comments around the important phases.


I decided to set maxprims/leaf = 2 as this gave me numbers closest to the paper memory usage wise (taking into account non-compressed leaves).


I don't bother storing the object bounding boxes ahead of time. The build is so fast, the memory savings are well worth the extra max3/min3 operations inside the splitting loop. This allows me to build the thai statue quite comfortably.



Phatom, toxie, I'm curious to hear from you if I got all the traversal cases right. It appears to be robust, but its also quite slow (still around half the speed of the kd-tree one). The build code doesn't seem to get stuck on any scene anymore and the output stats seem halfway decent.



Regarding the compression of leaves I was asking about earlier, the only ways I can think of to do this right now are to either store an extra bit on each leaf (is the left child a leaf?). Or to prefetch the left side during the traversal. Both cases require some additional bit-trickery in the traversal to make it go fast. Anyone have any better ideas? Does this even matter that much? Lets say 30% of the leaves are empty (what I typically see), then you are left with a bout a 10% memory savings from compressing the leaves. 16% at best. So unless there is a really clever way of doing the indexing, I would say its probably not really worthwile. toxie, am I missing something?
(L) [2006/06/26] [toxie] [bih / bkd implementations] Wayback!

Hey dudes. Have to check your posts in detail later.

But just wanted to say that you can now open a public discussion on the BIH!
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/06/27] [Phantom] [bih / bkd implementations] Wayback!

(moved this tread since it's not secret anymore; I think it contains some good info)
_________________
--------------------------------------------------------------

Whatever
(L) [2006/06/28] [beason] [bih / bkd implementations] Wayback!

Toxie, are there unpublished tricks which you specifically know about but are not allowed to discuss, which will prevent us from matching your published results? Sorry, I just don't understand your comment about not being able to discuss details.


Phantom and fpsunflower, are the slowdowns you are seeing in scenes which are reportedly faster (according to the paper)? I thought there was one or two cases where BIH was slower so I'm just checking.
(L) [2006/06/28] [toxie] [bih / bkd implementations] Wayback!

If you take a look at the people/companies we thank, than this should become clear.

So i cannot deliver any suggestions or help that goes beyond the released paper (unfortunately).
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/06/28] [Phantom] [bih / bkd implementations] Wayback!

I'll check, thanks! Quite obscure hint btw. [SMILEY Wink]
_________________
--------------------------------------------------------------

Whatever
(L) [2006/06/28] [toxie] [bih / bkd implementations] Wayback!

@phantom: real sorry, but......!   (see above)
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/06/28] [Phantom] [bih / bkd implementations] Wayback!

OK it helps, but only marginally. Just to make sure:


You where hinting at shortening the ray segment in case you only traverse the near node or only the far node, right? If you skip the near node, tnear should become the intersection point of the ray with the 'far' split plane, and if you skip the far node, tfar should be the intersection point of the ray with the 'near' split plane.


You know, I once accused Wald of deliberately keeping away certain details. And look what has become of him. [SMILEY Wink]


EDIT: Just checked Buddha again, looks like it is *very* close to kd/sah now, all of a sudden. Legocar still at 70%, Sponza & transformed kitchen suck completely, but Buddha rockz. Yay. [SMILEY Smile]
_________________
--------------------------------------------------------------

Whatever
(L) [2006/06/28] [toxie] [bih / bkd implementations] Wayback!

Okay. For the sake of not being compared to Wald:  )

In the case you traverse the near one: far = MIN(far,clip[0]);

In the case you traverse the far one: near = MAX(near,clip[1]);


Of course clip[0]/[1] are reversed when the ray direction on that axis is negative.

Also you have to keep the original far value on the stack if you need to go into both.
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/06/28] [Phantom] [bih / bkd implementations] Wayback!

Thanks, yes that's what I added. Seems to help quite a lot for Buddha.


Buddha is now at ~90% (that's within 'bad implementation tolerance' limits);

Fairy Forest at ~75%;

Legocar at 70%.


I think a more careful implementation would improve that further (we need tbp to the rescue here). I don't compress leafs yet (not sure yet how to do that efficiently, just added code to skip empty nodes, but even that takes conditional code as far as I can see) and the traversal is condition hell at the moment. I'm quite sure I should be able to improve that.


Looks to me like it's time to ditch kd/sah. BIH is more fault-tolerant, and a lot simpler to implement / maintain. Traversal is spagetti though. Traversing 4x4 rays doesn't help that of course. [SMILEY Smile]
_________________
--------------------------------------------------------------

Whatever
(L) [2006/06/28] [toxie] [bih / bkd implementations] Wayback!

yay! great implementation then!

looking forward when you reach the 100%!  =)


btw: some guys here at the EGSR 06 still believe that the SAH is the holy grail for everything!

and i'm not talking about newbies, actually these guys were all into the GFX business for 15+ years!

(but a pixar guy actually liked the BIH paper, so i don't need to care 'bout the others ;))


and something for all guys that will be on a conference sometime soon:

if you participate in some public discussion, don't forget to start your critics

with one of these buzzwords: "great paper", "nice talk" or "i really liked this work".

quality of the paper doesn't matter, btw. ;)
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/06/28] [Phantom] [bih / bkd implementations] Wayback!

Well I can imagine that people don't like to drop their kd-tree compilers for a 100-lines-in-C BIH compiler just that easy. [SMILEY Smile] It feels too easy. I'm not too sure that kd/sah will be gone by the way; the best implementations of kd/sah roughly take 10 times as much time as BIH (perhaps that's a bit optimistic, but still): That means that if you start rendering very complex scenes, the extra quality of kd/sah might actually become worthwhile again. Just imagine 8x8 samples per pixel, area lights with 64 samples, tons of recursive reflections and so on. If you toss enough rays at it, 9 seconds extra tree build time might not be a problem.


But for real-time rt in games, kd/sah should be a thing of the past for a while. I'm not so sure about full tree rebuilds though. Still feels like a waste.


Great paper btw. [SMILEY Wink]
_________________
--------------------------------------------------------------

Whatever
(L) [2006/06/28] [Phantom] [bih / bkd implementations] Wayback!

Buddha: ~92. [SMILEY Smile]


Need...to...focus on......REAL LIFE... or I'll get fired. [SMILEY Smile]


Ow btw some raw figures: 92% means I'm shooting 2336k rays/s at the Buddha (1024x1024, reference camera view), on a single core of a 1.66 Core Duo. That's 2.2fps.


Stats for other scenes:

Fairy forest: 1652k rays/s, ~1.6fps;

Scene 6: 3003k rays/s (!), 3fps;

Transformed kitchen: 422k rays/s (!), 0.4fps.


I obviously have a problem with those axis aligned scenes. In those scenes, it doesn't matter if I stop splitting at 2, 3 or 4 polygons, and it doesn't matter wether or not I refine split candidates that are outside the node box. So obviously I have a sub-optimal hierarchy builder.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/06/29] [toxi] [bih / bkd implementations] Wayback!

I agree that the kD is of course by no means dead. It can still amortize (as you said).

But we already work on speeding up scenes where the BIH fails, so....  :)


What i felt pissed off at (see above) that these guys actually didn't believe a word that other heuristics

could deliver reasonable performance. Just like i told: SAH = holy grail for -anything-.


And the problem with the axis aligned scenes could be a problem because of -large- axis aligned triangles?!

It's also a problem within my implementation (as you can tell from some of the statistics actually :(

Cause without the largeness "factor" it wouldn't make any sense as these triangles should/can be enclosed in flat voxels without any problem.


For the update of the tree:

Nobody did say that updates are not possible. In fact it should be fairly easy?!

(identify the leafs where stuff happens and update the splits recursively until you reach the nearest father of all these leafs (or nothing changes anymore))
(L) [2006/07/08] [Phantom] [bih / bkd implementations] Wayback!

I took the plunge and decided to modify my tracer so that it builds a tree for each frame.


Results, on a meagre 1.4 Ghz Pentium-M, single core:


Legocar (10k polies), running at 10fps @ 800x500 pixels (less than 100ms for TTI);

Fairy Forest (174k polies), 800ms for TTI (full rebuild + render, same res).


That's pretty amazing. Raytracing 10k triangle scenes, completely dynamic, at 20fps or more on a decent dual core machine is just awesome. I'll try to spent some time to build a basic ray traced game, with plenty of scene changes.


By the way, my data structures are very easy on the input side right now: The only thing that needs to be 'precalculated' per triangle is the normal. Toxie, did you account for that? I mean, Wald's triaccel structure is not easy to fill, and could add considerably to the frame time if you have to update it for 1M triangles.


EDIT: For the record, the timing for the legocar means that I'm shooting 4.7M rays/s through a (potentially) completely random and dynamic 10k triangle soup. I still can't believe it.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/07/09] [macdonag] [bih / bkd implementations] Wayback!

Hi all,

I'm another new guy here, been following the devmaster/flipcode raytracing tutorial off & on while working on my own (basic so far) ray tracer.  I've got to say this is an nice interesting, firendly forum.  Very impressed with what I've seen, especially lately.  Now the mention of 10fps for a dynamic scene of a 1.4 pentium is fantastic news!  I'd love to see what could be done on a PS3.... [SMILEY Wink]  Some day!


Anyway, I'l  maybe stick a WIP image of my own once I've got something less derivative, but in the meantime, I'd love to hear what your thoughts are for a little raytraced game.


Cheers,

good luck,

Graham
(L) [2006/07/09] [Phantom] [bih / bkd implementations] Wayback!

Hey MacDonag,


Welcome to the forum! Looking forward to those WIP images.


About ray tracing & games: Since the tutorials on flipcode/devmaster some of us focussed on fast ray tracing of static scenes, using kd-trees and the sah heuristic. Recently, Toxie wrote a paper on 'bounding interval hierarchies', which seem to be a bit less optimal for static scenes, but they can be build incredibly fast (less than a second for the 1 million triangle happy buddha model). This makes it possible to build a tree in 100ms or less for scenes consisting of 10k-100k triangles, which is easily sufficient for a nice looking ray traced game. So in short, there has been quite a breakthrough which took ray tracing from static scenes to dynamic scenes.


About this game: I have some small ideas, but it doesn't really matter; perhaps a 'Pang' clone, since I could use a ton of reflecting spheres for that. Most of the scenery would be dynamic, which is also something I want to show. But in case someone has a good idea (game must be simple, this is not really a game project) let me know.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/07/09] [Phantom] [bih / bkd implementations] Wayback!

Did some optimizations to the build process, inspired by the recent merge of build and render:


Legocar (10k triangles) builds in 7-10ms, so I now cast 5.5M rays/s through that scene (could be more if I improve res of course, res is now 800x500).

Fairy Forest (174k triangles) builds in ~192ms, so it runs at 2fps @ 800x500 on my low-end machine. [SMILEY Smile]


This machine has only 512Mb so I don't think building Buddha is fair, I'll wait till I have my dual core back.


EDIT: Sunday coding, build time for Fairy Forest now reduced to ~182ms.


EDIT2: Sunday night coding, build time for Fairy Forest reduced to ~175ms. And dude did that WK match suck. Initially I hoped France would win, just because of tbp, until Zidane got aggressive. I can cope with that, but not with the *very* childish behaviour of the crowd after that. Made me realize once again we're surrounded by fools. Sorry that was completely off-topic.


EDIT3: Toxie, in your paper you have stats for Fairy Forest, but you are using 'full shading'. In the screenshot I only see some texturing, no bilerp apparently, and no shadows. Is that correct? If so, I think I beat your scores. [SMILEY Wink]
_________________
--------------------------------------------------------------

Whatever
(L) [2006/07/11] [toxie] [bih / bkd implementations] Wayback!

Yay dude! Cool numbers!

Do you also use an adaptive/lazy tree setup by now?!


For the Fairy: Bilerp is there, but due to the INSANE texture resolutions one should indeed use at least trilerp (also alpha maps and bump maps are missing). Shadows aren't there as there is no light source in the .obj files.


For the triangle precalc: The construction is directly bundled with the tri-precalc, so all numbers consider this.


For the WCup: I almost smashed by beer into the television set when Italy had won. ;)
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/07/11] [tbp] [bih / bkd implementations] Wayback!

WCup silly comment: if you completly dominate for 1 half + prolong without scoring you deserve to lose. People booed because there was no replay on site of the headbutt thing; Zidane certainly did that because of some provocation, but it was absolutely idiotic to get caught on retaliation [SMILEY Wink]

On the other hand i'm not sure Holland vs Portugal was a demonstration of fair play either [SMILEY Razz]

All in all i'm glad we're done with that football madness for more 4 years.
_________________
May you live in interesting times.

[LINK https://gna.org/projects/radius/ radius] | [LINK http://ompf.org/ ompf] | [LINK http://ompf.org/wiki/ WompfKi]
(L) [2006/07/12] [andy] [bih / bkd implementations] Wayback!

hello there.


i've got a question about this bih stuff:

referring to phantom's psuedo-code, when building the bih, why is it necessary to keep a separate splitbox for generating the split candidates, why not just split at the centre of the nodebox?
(L) [2006/07/12] [toxie] [bih / bkd implementations] Wayback!

@andy: Cause the resulting tree is a bit faster to traverse, about the same size, but as simple to build as the straight forward heuristic.

My opinion why this works better: The nodes are kept as cubic as possible and this enables empty volume

to be cut off in large blocks (and also automatically).
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/07/12] [toxie] [bih / bkd implementations] Wayback!

and some more optimization hints: As i already mentioned in the links & papers section it helps to include an additional splitplane-order-case in the traversal/setup code. For details check Havran's latest paper on the "H-Tree" where he published the same idea as the "BVH2"-case. Including this case increased my render performances by 10-20% but at the same time decreased my number of nodes by 5-10%! As a result of the decreased nodes the setup time is also decreased of course.
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/07/12] [andy] [bih / bkd implementations] Wayback!

ok, thanks.

one more question while i'm here. i assume that the bih has the same problem as kd-trees regarding ray bundles with differently signed rays. i.e you have to split them up into separate rays. is that right?


back to coding...
(L) [2006/07/12] [toxie] [bih / bkd implementations] Wayback!

Nope. In principle you can traverse the bih with totally different rays in a bundle as the bih is a subset of a bvh after all.

All intersected nodes are traversed and so the order which one (left or right) of the children is traversed first doesn't influence the result. Nevertheless you shouldn't try to include rays into a package that point in totally different directions and additionally start from different points as this harms performance!
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/07/12] [Phantom] [bih / bkd implementations] Wayback!

I didn't realize that! That means I can make my traversal simpler / faster. Nice.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/07/13] [toxie] [bih / bkd implementations] Wayback!

@Phantom: It works in principle, yes. BUT: If you want to get rid of the same-sign restriction your traversal will become more complicated! So (in my case at least) one cannot gain anything from that. :((((
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/07/13] [Phantom] [bih / bkd implementations] Wayback!

No I mean that I don't have to process left and right in a fixed order, as I will have to traverse them both anyway. Saves a lookup or two.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/07/13] [toxie] [bih / bkd implementations] Wayback!

Not really. Cause you loose the advantage of traversing the nearer node first and thus no possible early exit (as soon as some intersection(s) is found). So in the worst case you would really need to traverse EVERY node intersected by the ray (of the whole tree!) then!
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/07/13] [Phantom] [bih / bkd implementations] Wayback!

Yep. Did some experiments, and it can't be done.


I did however find something else: I am tracing 4x4 packets, and this requires a huge test to verify that all rays stay in the left or right child (or inbetween). So I reorganized my packets: The first 4 rays now represent the corners of the 4x4 packet (shaft). This way, as long as the traversal is coherent, I only need to test these four rays. Coherency is lost as soon as I need to traverse left and right. By adding another 'if' and a flag 'iscoherent' (the ugliest possible solution) I gain ~10% performance. That brings the gains of 4x4 packet tracing to at least 20%, so you may want to implement that. [SMILEY Smile]
_________________
--------------------------------------------------------------

Whatever
(L) [2006/07/13] [toxie] [bih / bkd implementations] Wayback!

Hmmm. Sounds interesting indeed! And should also be possible for a kD?!

Nice idea!
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/07/20] [Guest] [bih / bkd implementations] Wayback!

Hi everybody,


I just started implementing the BIH Algorithm and while I have a working version right now, it is still at least a factor of 2 from my kd-tree implementation. So I started thinking about the paper for a while and am acutally wondering, why one uses two separate boxes for splitting and for the object bounds?! While it makes sence for overlapping cells, separated cells don´t seem to be a good candidate for this since it will probably create some empty leafs you don´t need when doing a proper traversal since the empty space between such cells is already cut off.


Am I missing something here?


Michael
(L) [2006/07/21] [toxie] [bih / bkd implementations] Wayback!

I don't really understand how you interpreted the paper.

Actually you only store two splitplanes per node. What do you mean by the two separate boxes?
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/07/21] [toxie] [bih / bkd implementations] Wayback!

Ah okay. You are right that it isn't 100% efficient to need to discard some of the proposed splitplanes.


But actually this scheme delivered the "best" trees (=fastest traversal) while keeping the build time as low

as just choosing the middle of the longest side of an axis of the current node-bbox.

But it's of course no problem at all, if you just implement this even simpler splitting scheme (middle of longest side) and later (if you like) add the global heuristic.
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/07/21] [Michael77] [bih / bkd implementations] Wayback!

Thanks, I just tested it in it doesn´t seem to make a huge difference on my scenes. However, I am still wondering why I get so many triangle intersections in some situations. It seems like a ray going just above a sphere will create over 100 intersectiontest for this ray without hitting a single triangle (while the average number of intersections stay quite low aroung 7-10). I guess the bounding planes are not fitted correctly so this happens.


When creating leaves, do you adjust the node bounding box first and create some additional splits to avoid these problems? Otherwise, I think it may happen in some situations. Imagine the following Situation when I create leafs with 1 primitive:
(L) [2006/07/21] [toxie] [bih / bkd implementations] Wayback!

I inserted a very simple additional empty-volume-cutoff into my code.

It works kinda like this: Any time i've picked a (valid) splitplane and doing the

left/right sort and splitplanel/r calculations, i also calculate the "real" bounding interval

of ALL the triangles (costs almost nothing). If this interval is rather small

(compared to the "actual" node-interval as it is implicitly stored in the tree) i decide

to put splitplanes "around" the triangles, so that the node-interval on that axis becomes

equal to the triangle-interval.


This is the proposed split:
(L) [2006/07/22] [Michael77] [bih / bkd implementations] Wayback!

Ah, that sounds good. I will try it. Hopefully it will help my performance quite a bit (still only reach about 50% of the performance of my kdtree but I think this has something to do with my traversal code right now (need to fix that)) since the tree looks quite ok.


Michael
(L) [2006/07/23] [calamari] [bih / bkd implementations] Wayback!

Hi there!


I'm new to this great forum. Thank you so much for sharing your code and ideas!
(L) [2006/07/27] [Michael77] [bih / bkd implementations] Wayback!

Hi again,


I am still working on my BIH Implementation and it still seems like I am unable to get anywhere near my kdtree performance. Since my tree seems to be ok (at least it produces a reasonable number of nodes and depending on scene size I get between 2 and 10 intersection tests on average (with some bad rays doing many more tests thought because the implicit bounding boxes are not tightly fit to the leafs),  I suspect that there might be something bad in my traversal code. Maybe someone can give me a hint here. Currently, the traversal looks something like this:
(L) [2006/07/31] [toxie] [bih / bkd implementations] Wayback!

On a first look this seems to be fine.
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/08/02] [Phantom] [bih / bkd implementations] Wayback!

I'm back.


During my vacation I have been studying Havran's H-tree paper, and I have been thinking about BIH a bit more. About this whole split plane candidate selection: It still feels a bit like you're using a random plane, which appears to give better results than a median split. How about a slightly different approach:


- The optimal split plane position according to SAH is between the spatial median and the geometric median.

- The cost function is horizontal at the optimum, and doesn't change too much near the optimum.


Ergo: An exact split plane calculation is not necessary (Havran uses discretization and that also works fine, for the same reason). Wouldn't a logical conclusion be that the exact average of the spatial median and the geometric median is a much better approximation of a perfect split plane candidate than some random global bbox subdivision?


Finding the geometric median is not trivial for tons of primitives, obviously, but I think there are ways to calculate it fast. Real problem might be the 'special cases', i.e. empty space cutoff.


Main problem is that I am not convinced that this 'new global heuristic' is any good. I suspect that picking random positions will work just as well.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/08/02] [Phantom] [bih / bkd implementations] Wayback!

But can you explain why the 'global heuristic' would work? I can't come up with any kind of a logical explanation. I don't understand why it beats a (admittedly crude) version of SAH.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/08/02] [toxie] [bih / bkd implementations] Wayback!

You did include a SAH setup to BIH?!

And it's slower?!

You're kidding!?  )


Anyways.. I cannot give any "real" explanation why it works so fine. The only stuff i had in mind i already posted somewhere here.

And that any additional trick i tried to include into the setup failed performance-wise (except for the empty volume cutoff i mentioned above in combination with the additional "BVH2-case"). Though i never implemented a SAH setup for BIH, so i cannot tell if at least this would result in faster trees.


Sorry. ;(
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/08/02] [Phantom] [bih / bkd implementations] Wayback!

No I didn't include SAH in BIH, I just hoped that the average of the spatial median and the geometric median would be close enough to the position SAH would come up with, since it is between these values anyway and SAH doesn't change alot close to the optimum. It's a crude approximation, but I hoped it would be better than something that almost feels 'random' (the global heuristic). Apparently, the global heuristic is not so random, but I find it disturbing that there's no explanation for that. [SMILEY Smile]
_________________
--------------------------------------------------------------

Whatever
(L) [2006/08/03] [toxie] [bih / bkd implementations] Wayback!

exactly. you need an additional bit unfortunately.
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/08/07] [Phantom] [bih / bkd implementations] Wayback!

I implemented this this morning (just the single-axis empty space cut-off) and it does help. I would like to compare some implementation details:


- In the BIH tree node struct, I use 1 bit in the node ptr to store the flag for this situation;

- a 'ClipNode' is generated if the size of the node box over the current axis is 1.3x the range occupied by the primitives (or more);

- During traversal, I need to test for this flag (ugly if-statement);

- If the bit is set, I adjust near and far:
(L) [2006/08/07] [Phantom] [bih / bkd implementations] Wayback!

Tree nodes are 8 byte aligned, indeed. I use bit 0 & 1 for the axis (3=leaf), bit 2 for clipnode. I am not sure yet how I will indentify the 6-plane clip nodes, perhaps I can store that as value 7, i.e. clipnode flag set, axis set to 'leaf' (3). That would mean however that a leaf should only be identified as a leaf when the clipnode flag is NOT set, but that should work.


The things you can do with 3 bits. [SMILEY Smile]
_________________
--------------------------------------------------------------

Whatever
(L) [2006/08/07] [Michael77] [bih / bkd implementations] Wayback!

Strange, for me it doesn´t help at all. It seems like the additional if in the traversal code hurts performance more than the saving of 1 node helps. But maybe my build code is bad in this case. The case is currently implemented like this:
(L) [2006/08/07] [Phantom] [bih / bkd implementations] Wayback!

I did not check your code in it's entirity, but one thing I noticed right away is that you only cut empty space if the first primitive on the left side is not the first primitive of the list. I would like to note that there can be empty space, even if there's no primitive to the left side of the split plane candidate.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/08/07] [Michael77] [bih / bkd implementations] Wayback!

You´re right. This may be the problem, I will check that [SMILEY Smile]
(L) [2006/08/07] [Phantom] [bih / bkd implementations] Wayback!

- I don't have BVH2 implemented yet, so it's the 'clipnode' or nothing. Comparisons between clipnodes and BVH2 nodes should be possible by the end of the week.

- SAH is hardly possible with my BIH compiler, as it requires extensive analysis of the primitive list versus an (almost) free split candidate in the current implementation. I would like to try full sah someday anyway just to see how bad or how good bih is compared to something (more) optimal.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/08/07] [calamari] [bih / bkd implementations] Wayback!

With the 1.3 factor I'm getting about a 10% speedup (that's single ray traversal only, I didn't implement a SIMD version yet).


About 20% of my total number of tree nodes are clip nodes, is that similar to your results?


For those having the same problem with 4 byte aligned nodes: I'm storing the child of the clip node right after the clip node in memory, so I can use the children pointer of the clip node completely for storing the axis / indicating a clip node.
(L) [2006/08/10] [Phantom] [bih / bkd implementations] Wayback!

@Toxie: I added bvh6 nodes, but I have a problem, which I may be able to solve but perhaps you can help:


A bvh2 node takes just as much space as a regular node (it needs a child ptr and two clip values). However, a bvh6 node takes more space, as it needs six planes. This spoils the fun, as this bvh6 node is often the left child of a node that also has a right child, so the right child address no longer is 16 bytes further in memory... How did you implement this? As a temporary hack I made all nodes 32 bytes, so I could at least test the bvh2/bhv6 combo.


By the way, the way I do it right now is like this:


- I construct bvh2 nodes as I described earlier (1.3 empty space / occupied space ratio);

- If it's the second bvh2 node in a row, the first one is replaced by a bvh6 node.


This takes no extra time in the build code, just a simple flag check.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/08/10] [toxie] [bih / bkd implementations] Wayback!

I didn't include the BVH6 case, only the BVH2.

But why do you need a second pointer? In the BVH6-case, you just clip away empty volume, so the second pointer points to an empty leaf = could be discarded anyway?! And as you need a separate traversal case also, you can directly skip the right pointer. And even better: As you seem to have all BIH nodes = 16bytes, you could simply put the left and right child together into one?!

Or did i miss something?
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?
(L) [2006/08/10] [Phantom] [bih / bkd implementations] Wayback!

The problem is that the bvh6 case is someone's child. And that someone has another child, which is supposed to be 16 bytes further. Now if the 'left' child suddenly requires 32 bytes, it overwrites it's sibling.


Once I am processing the bvh6 node, there is no problem: This node indeed has a single child, just like a bvh2 node.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/08/10] [Phantom] [bih / bkd implementations] Wayback!

After some more experimenting I decided it's not worth it. I hoped that collapsing several bvh2 nodes into a single bvh6 node would help, but gains are going to be below 2%, at the expensive of a considerable increase in code complexity (especially in the traversal code).


So right now I have regular BIH nodes and BVH2 nodes. For the legocar scene, I'm now stuck at about 70% of the performance of kd/sah.


I didn't use buckets in the compiler btw; these caused a further drop in rendering performance, plus I couldn't really measure any performance in build times.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/08/10] [Michael77] [bih / bkd implementations] Wayback!

[quote="Phantom"]
(L) [2006/08/10] [Phantom] [bih / bkd implementations] Wayback!

Well I would assume that if a ray misses the specified interval, min will be greater than max and the ray will be invalidated.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/08/10] [Michael77] [bih / bkd implementations] Wayback!

Mmmhh, you are right, haven´t thought of this this way.

back