Thoughts on spatial subdivision for very large scenes back
(L) [2006/02/15] [playmesumch00ns] [Thoughts on spatial subdivision for very large scenes] Wayback!This is probably slightly OT since my focus is on production rendering rather than RTRT.
One of the core features of the design of my renderer is the ability to handle arbitrarily large scenes without requiring a correspondingly arbitrary amount of RAM in order to render.
This naturally entails caching parts of the scene to disk and reloading them as required.
My first thought of how to do this has been to split the scene into "bins" of triangles according to some maximum-bin-size limit. Each bin is a kd-tree of its own, so the overall scene structure essentially looks like a lot of boxes (each containing a tree), themselves held in some spatial subdivision scheme.
The question is, what to do with objects that don't fit entirely in one bin. Should I just chop the object along some axis, thereby possibly creating lots of new triangles along the split, or should I perhaps use some spatial structure that allows for overlapping bounds? What spatial structure would be the best for holding the bins?
(L) [2006/02/15] [fpsunflower] [Thoughts on spatial subdivision for very large scenes] Wayback!I've thought about (and written) stuff along these lines.
I think the best granularity of cache management for huge scene rendering is the object level. You create a two level tree (or whatever structure you're using). One level for the object bboxes (which can be built without knowing anything about the object, just knowing where it is in space). Then, per object a tree with just the triangles for that object. This also lets you do instancing for free (even OpenRT works like this actually).
Its totally possible to take the other approach (have you checked out the Kileuea papers?), and treat your scene as a massive triangle soup, but you're gonna have to deal with objects being split apart and generally need to shuffle quite a bit more data around (think about the per-vertex/face/face-vertex attributes and how those would have to be split apart and duplicated). With the object-level approach, you're guarenteed to have all the data you need around in close to optimal form.
There's probably an extra level of acceleration structures at the triangle/patch level if you need to do stuff like displacements and/or on-demand nurbs/subds.
If you are working on this seriously, I recommend looking at thomas' old posts on the highend3d renderer mailing list (which sadly seems to have died). mental ray has probably the most extreme approach to caching/on-demand stuff as they apply it to everything the renderer can produce/load (geo, textures, photon maps, shadow maps, etc...) in a unified fashion.
(L) [2006/02/16] [playmesumch00ns] [Thoughts on spatial subdivision for very large scenes] Wayback!Thanks for the advice, I've been following the progress of your renderer for some time now and it's looking very good indeed.
I'm a bit wary of taking mental ray as an example: we're using it in production here and it's a bit flaky, to say the least [SMILEY Smile]
I agree that the cleanest way to do it would be at the object level, I suppose that not supporting super-heavy meshes is a limitation I'm prepared to accept as in a real-world situation (for my applications at least)  base geometry tends to be on the order of 10k-100k triangles per object, which are then smoothed as subdivision surfaces.
Which I guess leads me on to another question: dynamically tesselated geometry for smooth surfaces and micro-triangle displacement. Is this something you've implemented? The obvious way would be to have triangle primitives that might themselves be another (regeneratable on demand) kd-tree holding the finely tesselated geometry. The way PBRT approaches primitive intersection seems to be the cleanest way of handling this from a design point of view: using inheritance to allow many different types of surfaces. However I'm wary of having virtual function calls (or even switch statements) inside the core ray intersection code.
I guess the only thing to do is write a test program that intersects triangles directly, and another that intersects polymorphic primitives and see what the performance hit is for invoking the vrtual functions.
(L) [2006/02/17] [playmesumch00ns] [Thoughts on spatial subdivision for very large scenes] Wayback!Hmm I thought a virtual function call was even worse than a switch? My low-level c++ knowledge needs some work!
Surely I can't use templates to accomplish this since the type of object will only be known at runtime: even if I template/inline the ray intersection code, I'll still have to do some kind of redirection somewhere to handle different geometry types seperately.
I agree that just binning each object seperately is certainly the way to go, but I'm thinking it might be worth combining bins. For instance, consider an object that's exported from a 3d app with multiple shaders attached to different faces. The simplest way to handle this would be to export each shading group (group of faces with the same shader) as a seperate object, but then you'd want to treat these seperate groupd as a single object again when it came to the top-level spatial subdivision scheme. It may also be beneficial to merge lots of clustered small objects (e.g. polygonal hair) together.
Whatever I do, it's likely that the bounding boxes for objects will overlap. Is there an elegant way of handling this in a kd-tree?
A
(L) [2006/02/17] [Lynx] [Thoughts on spatial subdivision for very large scenes] Wayback!virtual functions slower that switch? I've always read it's the other way round...
But i haven't found really good information about the real world penalty yet, i found one research back from 1996, when a Pentium Pro was the latest and greates you could get in x86 world. And they simulated all results, so i'm not sure if it's really worth anything (anymore)...CPUs have also changed quite a bit in the last 10 years...
Maybe in a few weeks i can tell you how much of a difference it makes to change the primitive triangle intersection to a virtual function (because i'll have to do so too for instancing, support of other primitives than triangles etc...).
And i don't see either how templates would help you there.
Overlapping subtrees, well that's exactly the reason why yafray goes from a 2-level acceleration structure back to a global kd-tree, basically performance suffers badly (forget your virtual function penalty there...) on say an interior rendering where people have the room one object, the sofa another one, the pillow on the sofa one etc. and they all overlap => easily a triplication of rendertime, i've tried it.
I would also say that joining and/or splitting objects to minimize overlapping seems no bad idea if they contain their own acceleration structure. Otherwise your upper (?) level tree would have to find splits between bound extends of your objects where one object has no geometry, don't think that's easy. You can probably use the object tree efficiently to check if it has geometry in a bound that partially overlaps with its own bound, but you still have to find candidates first...and even then you'd travel the upper tree into a depth where it starts resembling the object tree, only to start at the root of the object tree again.
(L) [2006/02/17] [tbp] [Thoughts on spatial subdivision for very large scenes] Wayback!I will just comment on the virtualness cost, and please take what i say with the proverbial grain of salt because i haven't seriously checked that in a year.
Of all the forms of branching (i'm talking about x86), indirect register based calls are the worst because basically there's little a cpu can do to predict where it will land.
Virtualness is implemented, at least on gcc/icc/msvc, through vtables and the typical sequence for a call is something like:
(L) [2006/02/17] [Lynx] [Thoughts on spatial subdivision for very large scenes] Wayback!Well i just tried changing all triangle functions of the "old" yafray to virtual. Result: less than 2% increase in rendertime. So clearly, there are a lot of other bottlenecks...granted, it's not the fastest renderer, but i still think you'll have to concentrate a lot of effort on other places before this really becomes a noticeable performance drawback.
Though you should be aware of one side effect: classes with virtual functions are larger (by the size of one pointer), so that makes you dealing with even more megabytes, but then again we're talking about scenes that will not fit in RAM anyway, so maybe still acceptable...
fpsunflower:
Ah now i understand, templates for reusing kd-tree code for objects with different primitives. But then you can only have one type of primitives in each object, but the virtual call to intersect a possibly complex object should really be negligible.
But i don't understand how mailboxing will help you there, if two object bounds overlap, you have to test both for intersection...i don't see a way to tell which (if any) will have the closer hit...
About the virtual vs. switch statement, i don't know what to believe, read 10 pages about it and 5 will say a) is better and the other 5 say b) is better...
That's a case for myth busters [SMILEY Very Happy]
back