(L) [2019/02/07] [ost
by olliej] [Performance issues with Wald&havran's nlogn kdtree construction] Wayback!I've been attempting to implement Wald and Havran's nlogn kdtree construction for a photon map implementation, and it consistently has vastly worse construction perf, and I'm not sure how much is due to splitting semantics.
If I implement the naive nlog^2n algorithm using a median split as suggested by Jensen then the split logic (assuming sorted list) is (pseudo code)
Code: [LINK # Select all]split_axis = {longest axis}
List<Photons> the_photons = {list of photons sorted along split axis}
split_index = the_photons.len
split_plane = the_photons[split_index].position[split_axis]
left = the_photons[0..split_index)
right = the_photons[split_index..the_photons.len)
This fairly obviously results in a guaranteed max time complexity of nlog^2n, and performance (by measurements) seems to grow at approximately that rate.
When I try to use the W&H approach the constant factors seem to be vastly higher, and the tree ends up being much worse. I believe the issue is how I'm splitting.
Here's what I'm doing (pseudo code again)
Code: [LINK # Select all]
 initial setup:
   events = [];
   for photon in photons {
      for axis in {x,y,z} {
         events.push({axis:x, photon:&photon}) // Actual data structure is saner, but functionally equivalent
      }
   }
   sorted_events = sort events by position
 
 splitting:
   events = sorted list of events for this node
   split_index = the_photons.len
   split_axis = {longest axis}
   left_count = events.len / 3  / 2 // we have 3 events per photon
   
   // find the actual split point
   split_event = {}
   max_axis_count = 0 // actual variable name is also terrible :D
   for event in events {
      if (event.axis != split_axis)
        continue
      max_axis_count++;
      if (max_axis_count != left_count)
        continue
      split_event = event
      break
   }
   ...
 
At this point I have to actually split around Code: [LINK # Select all]split_event, but I can't work out how to do so efficiently.
The reason my current code goes wrong is because I split on Code: [LINK # Select all]event.position[split_axis] < split_event.position[split_axis].
This trivially ends up breaking the invariant that left and right nodes are split in the middle of the photons along the split axis.
The problem I can't work out is how to do the split given that the events along the left and right of the split may include events belonging to photons that belong in the other half. What we logically want is something like:
Code: [LINK # Select all]   ...
   left_events = set{}
   for event in events {
      if (event.axis != split_axis)
        continue
      left_events.add(event.photon)
      max_axis_count++;
      if (max_axis_count != left_count)
        continue
      split_event = event
      break
   }
then our split logic easily becomes a matter of splitting on Code: [LINK # Select all]left_events.contains(event.photon)
But that introduces a hash table or similar which both uses a large amount of memory and is expensive to build.
The alternative method I can think of it to essentially store left/right in the photon structure itself, and then during the split search update as appropriate, doing that seems kind of ugly, but seems like it would be the computationally fastest way to do it?