(L) [2013/11/02] [chsu] [Aila Laine traversal kernel: memory layout wasteful?] Wayback!In noticed that in the purportedly fast Aila and Laine traversal code uses a triangle index look-up buffer with a consistent memory layout as it's leaf-buffer... it seems wasteful:
leaf buffer entry (48 bytes)
{
    float4 (woop transform z)
    float4 (woop transform x)
    float4 (woop transform y)
}
index buffer entry (12 bytes)
{
    int (triangle_index)
    int unused
    int unused
}
Since the leaf buffer is interpersed with terminating "-0" to signify the end of a leaf's triangles, this ensures that the leaf buffer index (leafAddr in their code) can be used to map to global triangle index.
I guess this is a question of: is this worth it? Is it worse to add in the triangle index into the leaf-buffer?
I'm also wondering why not use the unused space in the node entry to store the number of leaf primitives belonging to the node? This would remove the need for terminating values, and the need for index padding.
node buffer
{
    float4 child0 bounding_box xy
    float4 child1 bounding_box xy
    float4 child01 bounding_box z
    int4 child indices (z, w unused)
}
Well I guess I'll implement this way and find out [SMILEY :)]
(L) [2013/11/04] [mpeterson] [Aila Laine traversal kernel: memory layout wasteful?] Wayback!hallo, i have seen also some strange/nonsense code sequences there. please
report improvements in terms of perf. and/or code-layout, mp.
(L) [2013/11/04] [chsu] [Aila Laine traversal kernel: memory layout wasteful?] Wayback!I realize there's some reasoning behind using terminating sequences instead of storing child counts, and thus using a consistent memory layout with padding.
The reason is for speculative traversal and to save registers. Speculative traversal means that leaf intersection tests are postponed until each thread finds a leaf in the BVH, this is reported by their paper to increase SIMD coherency and thus performance by 5-10%. Postponing means saving the 2 leaf addresses (i.e. leafAddr and nodeAddr in their code). Using terminating sequences in global memory means not having to store 2 additional leaf-counts in the register.
(L) [2013/11/05] [jbikker] [Aila Laine traversal kernel: memory layout wasteful?] Wayback!I am using the Aila and Laine method for a few years now, but instead of null-terminating the triangle list, I multiply each triangle index by 2 and use the lowest bit to mark the last triangle in a leaf. Same effect, but without extra data.
(L) [2013/11/06] [jbikker] [Aila Laine traversal kernel: memory layout wasteful?] Wayback!Yes, bvh leafs point to an entry in a list of triangle indices; every index is shifted to the left by 1 so the least significant bit can be used as a null terminator. That way a leaf only needs to store the index of the first index. Additionally, an index is always inverted (so it is always negative), which lets me mix child node addresses and indices; if a 'child address' is negative, it's the index of the first entry in the index list, otherwise it's a node address. Benefit of this is that leaf contents can be placed on the stack for later processing.