Memory efficient BIH implementation

(L) [2007/08/19] [gfyffe] [Memory efficient BIH implementation] Wayback!

Hey everybody.  I just had a jazzy idea: applying the in-place acceleration data structure paradigm to BIH trees.  Check it out:

Consider an array of triangles, stored as triplets of vertices (36 bytes) plus one triangle pointer (4 bytes) with the low 2 bits of the pointer being used to store the split axis (x, y, z, or leaf if we're a leaf).  That fits into a nice 40-byte structure.  Then you have one global triangle pointer with 2 bits for the split axis for the root.  These 48-byte structures are ordered in a special way, such that BIH traversal can be performed on them with no additional storage, which I will make a poor attempt at describing here.

The triangle structure is roughly this:

struct Triangle

{

    float vertex[3][3];

    Triangle * split;

    // plus accessors / mutators to deal with stuffing the axis (x, y, z, or leaf) into the low 2 bits of the split pointer

};

Start with a triangle * called "split" that you initialize to the root triangle pointer, as well as the root split_axis, mentioned above.  Also start with array bounds set to 0..n-1.  Then visualize this array:

[t_0, t_1, t_2, ... t_split, t_split+1, ... t_n-1]

So we're looking at the array within our current bounds, bound_low..bound_high, which is currently 0..n-1.

Here's the trick:  Our BIH left plane value is stored in t_split->vertex[0][split_axis] and our BIH right plane value is stored in (t_split+1)->vertex[0][split_axis].  How is this possible?  Well, first of all, we have cleverly placed at t_split the triangle with the greatest value along the split axis among the triangles in t_0..t_split, and likewise we have placed at t_split+1 the triangle with the least value along the split axis among the triangles in t_split+1..t_n-1.  Furthermore, we can reorder the vertices inside a triangle during our tree construction so long as we keep the same cyclical ordering, which means we can force vertex[0] to be the vertex of the triangle having the greatest (or least) value along the splitting axis where appropriate.  Note that you can still use triangle representations which only keep one vertex as x,y,z and then use 6 other values for some other kind of triangle testing, since we can convert our 3-vertex triangles to this representation as soon as they are used as split triangles within the tree construction.

Like in an ordinary BIH tree, we use the BIH planes to determine which children we need to visit.  If we need to visit the left child, we set our bounds to (bound_low, split) and we set split to t_split->split, and set split_axis to t_split->split_axis.  To visit the right child we set our bounds to (split+1, bound_hi) and we set split to (t_split+1)->split, and set split_axis to (t_split+1)->split_axis.  Note that split and split_axis are both stored within the same 4 bytes.  We use a stack to deal with visiting both children, just like normal BIH trees.

Of course, before we test the planes and such, we first check if the split_axis is set to "leaf", in which case we simply test all the triangles from bound_low to bound_high.

Modifications to the BIH construction algorithm should be pretty straightforward to deduce from the above traversal description.  Note that we can still use non-sorting-based SAH code and we only actually need to swap the two triangles split and split+1 into place, which means we only need to perform 2N triangle swaps during the entire build.

Possible disadvantages I can see here include:

1) increased memory bandwidth consumption during traversal.  Basically we have larger BIH nodes, and they are not nicely aligned to cache lines [SMILEY Razz]

2) I have no idea how to accelerate this using SSE.

3) BIH2-style nodes may be impossible to implement.  Or maybe we could do that, by increasing the size of the structure, but that kind of defeats the purpose.

4) I have not yet implemented it [SMILEY Very Happy]

The only real advantages I can think of:

1) It only takes 4 additional bytes per triangle over just storing triangles.  But maybe we can get ordinary BIH trees that small anyway if we force a greater number of leaves.

2) If we use object-median instead of SAH, we don't need to store the split pointer, and if we use largest-bound-axis, we don't need to store the split axis, which would hurt performance but get the data structure down to 0 bytes.  Then we just have a sorted array of triangles that can be traversed like BIH [SMILEY Very Happy]

What do you all think?
_________________
- Graham Fyffe

Memory efficient BIH implementation back