adaptive packet sizes

(L) [2006/07/20] [andy] [adaptive packet sizes] Wayback!

i've been playing around with some traversal techniques and i thought maybe it was time to share some preliminary results.

after reading wald's bvh paper ([LINK http://www.sci.utah.edu/~wald/Publications/webgen/2006/BVH/download//togbvh.pdf]) it occured to me that the coherency of a packet of rays really depends on the size of the bounding boxes you're currently intersecting, and consequently that a ray packet is more coherent when you're traversing the top of tree than it is near the leaves. in order to exploit this fact i devised the following traversal algorithm, where the ray packets can be split in half along a bisecting plane (alternately horizontal and vertical) if they're too big.

in very simple form:

(L) [2006/07/21] [toxie] [adaptive packet sizes] Wayback!

Funny. I also did something like that (but for general ray arrays, not only z-order), but didn't get this immense speed-up.
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?

(L) [2006/07/21] [andy] [adaptive packet sizes] Wayback!

i'd be interested to know what your conclusions were.

what packet sizes did you try?

where were the main bottlenecks?

atm it's the shear number of ray/bound intersections that's holding mine up. as in wald's paper, i only use the packet/bound test to reject bounds, i then test the individual rays directly against the bound if that test passes. if i could find an efficient way to update the max t value for the whole packet i could probably get rid of the individual tests.

i'll try to keep you informed if i make any astounding progress.

(L) [2006/07/21] [toxie] [adaptive packet sizes] Wayback!

my own experiments used package sizes of 1024 and 512 rays, both about equally fast.

as i didn't pose any restrictions on them (like the z-order/peano curves) the code wasn't as

elegant as yours, so unfortunately the overall performance (=fps) was about equal to tracing "standard" 2x2 bundles.

as you said, the testing of the planes/ray-packets should be the bottleneck. but better than testing the individual rays, so no solution for that. :(

in my case i suppose i did too much work using the individual rays and this means a lot of "useless" calculations.

but one thing confuses me: when you test which rays (in the "slow" non-reject case) go to the left and/or to the right, how do you manage the individual rays

for the two children? just de-activate/mask the rays that are not intersecting "the other" child or do you sort them into a left and right array and traverse

the tree further with these separate (actually they can be overlapping) parts?!

and is the updating of the max t value really necessary?
_________________
what do you expect to do if you don't know what to do when you've got nothing to do?

adaptive packet sizes back