MLRTA revisited back
(L) [2006/09/03] [Phantom] [MLRTA revisited] Wayback!I just implemented MLRTA (blatanty copying code from Carsten Benthin's thesis, I admit). Quick results:
- I implemented it for 4x4 packets.
- Very slight (but undeniable) speed gain over my 4x4 kd packet tracer (couple of percents) on Sponza.
- Small speed loss on Legocar, indicating that you need a reasonably complex scene to benefit.
- Main speed loss is coming from the leaf processing: At this stage, you have to compute near and far values for all rays against the leaf bbox. This overhead is only acceptable if the number of traversal steps is sufficiently high, which explains the poor legocar performance.
- I have a stupid remaining artifact (does anyone else have the impression that over 50% of dev time is going to debugging artifacts?): Some packets (usually located where two large polygons meet) return all black pixels.
So... A quick first implementation beats my kd-tree traversal for the type of scene that I'm aiming for (~100k triangles). Not bad. Perhaps I'll revisit the common start node thingy from the MLRTA paper too to see if that helps. But first I need to optimize and squash the ugly artifacts.
_________________
--------------------------------------------------------------
Whatever
(L) [2006/09/03] [Phantom] [MLRTA revisited] Wayback!Update: Gains for Sponza are 2.5%. Artifact gone.
Sadly, MLRTA can't be applied to shadow rays: The traversal code does not keep track of distances along the ray, so you can't early-out if you're past the primitive that contained your intersection point (I always trace from the light to the intersection point, because that way the shadow rays have a common origin).
So: For the moment, MLRTA seems promissing. It fits in nicely with my architecture, so I'll leave it in there. But the gains are disappointing to me. Perhaps I'm still doing something wrong.
EDIT: Small fix to leaf node ray clipping code: Gains up to 4.5%. If I would remove shading and secondary rays, I guess MLRTA would perform about 20% above the classic approaches which is in line with Reshetov's claims for scenes with low occlusion.
_________________
--------------------------------------------------------------
Whatever
(L) [2006/09/05] [Nemesis] [MLRTA revisited] Wayback!....
So: For the moment, MLRTA seems promissing. It fits in nicely with my architecture, so I'll leave it in there. But the gains are disappointing to me. Perhaps I'm still doing something wrong.
...
Hi everyone (I think it's my first post to this great forum),
just some quick note to the MLRTA implementation from my thesis. I haven't read all posts in this forum,
so if you already using these techniques ignore my comments:
In the case you are using the ray segment algorithm with a single 'near' and 'far', you can achieve some speedup
by removing the horizontal operations (e.g. sseHorizontalMin), which are extremely costly for such a tiny loop. Here,
plain C code without SSE can even be faster than a version with SSE....
It's very important to avoid unnecessary intersection operations by perfoming some kind of a hit-leaf-voxel test for each four-ray packet.
If a four-ray packet misses the leaf-voxel skip it for all remaining intersection tests.
Best regards,
C
(L) [2006/09/05] [Phantom] [MLRTA revisited] Wayback!Hi Carsten, and welcome!
I tried the first MLRTA approach from your paper, the other one is currently between '#if 0' and '#endif' because I wasn't sure how to implement that sseHorizontal, so it's good that you mentioned it. [SMILEY Smile]
I was hoping that this alternative beam traversal would be good for secondary rays; I didn't succeed in applying MLRTA to those so far. All I could get is a small speedup using a regular core for 4x4 rays, and keeping track of coherent bundles. As long as the four packets visit the same nodes, you can skip some instructions; making these conditional actually helps (about 5-10%). It's ugly code though, obviously.
For secondary rays, I currently use plain vanilla 4x4 packet tracing, but since MLRTA performs better for primary rays, I would love to add it to those as well.
One other question about your thesis: Yesterday I added the hash/mailbox thing you mentioned, but in my code, it just degrades performance. Perhaps my scene is unsuitable, but the mailboxing doesn't help at all; number of intersections doesn't vary noticeably. How did this perform for you?
_________________
--------------------------------------------------------------
Whatever
(L) [2006/09/07] [Phantom] [MLRTA revisited] Wayback!I got the second type of MLRTA traversal working, thanks for the sseHorizontalMin snippet.
To get rid of that horizontal operation, I suppose one could simply figure out (per axis) which ray will always represent the minimum and which one the maximum, right? I'll try it to see what happens.
The problem with the shadow rays is the termination criterium which you didn't mention in your paper. I think I figured out a way to do it properly; will try it tonight. So far I only used MLRTA for actual traversal; entry point search is not in yet. But why didn't you use MLRTA traversal combined with EP search?
About BVH traversal: Currently, my tracer supports two approaches: A scene can either be represented by a 'BIH' (Toxie's paper), or a sah/kd-tree (based on the latest Havran paper). The BIH structure can be build in very little time, but perfoms at about 70% the speed of sah/kd. Kd-trees take a while to build but deliver superior performance (10M rays in Sponza (78k), with texturing & bilinear filtering, Schlick's model & a light source, on a Core2Duo @ 1.66Ghz). Recently a new paper on streaming construction of sah/kd trees was released that promises to build ~100k trees in 30ms or so, on a single core. Once I get that working properly, kd/sah seems to be the optimal combination again, unless memory is a problem (Tai statue).
About the mailboxing: I used to have no mailboxing at all. So, testing all encountered triangles vs. doing the hashed mailbox gives me the same performance figures and apparently the same number of primitives. I did run a debug build to see if any triangles get skipped at all thanks to the mailbox, and it appears to happen, just very seldomly. I'll try to do some better statistics.
_________________
--------------------------------------------------------------
Whatever
(L) [2006/09/08] [Phantom] [MLRTA revisited] Wayback!Carsten,
The problem with the shadow rays turned out to be quite nasty: Some 4x4 packets hit primitives that are really far away (big depth difference). These cases account for a lot of the intersections. I added a test so that MLRTA is only used for shadow rays towards intersection points that are close together, and this time I got the expected speedup over normal 4x4 packet tracing. Sponza speed is now up to 11.3M rays/s, so that's a solid 13% improvement - or, since only the shadow rays are affected, at least a 26% faster handling of shadow rays, if I ignore 'overhead' like shading.
The 10M figure from my previous post and the new 11.3M figure indeed includes shadow rays (from a single light source), but also all overhead like shading, converting to 32bit color, ray generation and so on. It's the total frame time divided by the number of rays spawned.
(L) [2006/09/08] [Phantom] [MLRTA revisited] Wayback!Sponza (at least the one I use) has normals. I'll upload a demo of my stuff including Sponza in obj file format (with textures) somewhere this weekend. I'm really curious what your results are with this set, and I'm also interested in the performance of my current code on Conroe; previous experiments showed huge gaines on this platform. I might just beat the 10M per core boundary, that would be very cool.
(L) [2006/09/09] [Xela] [MLRTA revisited] Wayback!Phantom/Nemesis,
I was reading the thread for a while and then decided to crush the party not for any particular reason except that today is Friday and I dont want to do anything else [SMILEY Smile] .
These are my 2 cents:
There are many ways to do packet traversal, which roughly could be split into 2 big groups:
1.    Down-to-Earth honest traversal of all rays in packet
2.    Cheating
Last one includes all forms of frustum traversal and interval traversal. Inverse frustum traversal, as described in MLRTA paper, could be used again for 2 separate purposes:
1.    EP search
2.    2nd type of the traversal above
EP is good in situations which are good even without it, like architecture. So, lets forget about it for a moment.
With any form of cheating during the traversal you may end up in leaves which are actually not intersecting the beam. So it is very very important to do culling. What Im doing is excluding all rows in 4x4 packet not intersecting with BB of objects in the leaf (has to be stored in the leaf node). This is also great for all other types of traversal approaches anyway. It can be done very cheaply for coherent packets (you dont want to be fancy with incoherent packets anyway).
Performance-wise, inverse frustum traversal by itself beats slightly honest 4x4 traversal, but trails the interval traversal (one which is described at the end of my paper).
Sad reality is that it all works well only with tight coherent packets. So, Im using MLRTA approach for these rays only:
-    coherent primary
-    coherent shadow if primary rays hit flat surface
-    coherent reflections if primary rays hit flat surface
With inverse frustum youre actually traversing the tree on behalf of all potential rays inside the frustum (similar for interval traversal), so when frustum becomes wide you will do a lot of unnecessary work better avoid it. There could still be potentially viable approaches, like splitting frustum or intervals as suggested by somebody in this forum , but I never tried it.
With shadow rays it also helps to check against the last found occluder before engaging any actual traversal.
As for mailboxing, I was using the simplest form possible storing the last found intersector but then abandoned it as performance gains were not that great anyway.
For rays after multiple bounces nothing actually works well, at least IMHO. Thats what I actually going to say in Utah on RT06. I could not post the paper online, as most other guys do here in Santa Clara we cannot control our own web pages. Mine was not updated for over 5 years. Still, I could email the paper if anybody is interested (it has nice charts [SMILEY Smile] ).
--Xela
_________________
[Xela]
(L) [2006/09/09] [Phantom] [MLRTA revisited] Wayback!Alexander,
Good to see you're still around. I finally grasped MLRTA, after all the confusion. [SMILEY Smile]
It's fun to see what you use MLRTA for: It's ~exactly what I use it for, including the 'flat surface test': I calculate the min and max distance of all intersections of a 4x4 packet and demand that they are 'close'. I guess there are advantages to your approach: It doesn't depend on scene size, for example. On the other hand, I can still use MLRTA in case of curved geometry, like the ceiling in Sponza.
Mailboxing (the one Carsten described) helps for shadow rays, but not for primaries.
In any case I found that various approaches work well for different scenes: Plain 4x4 (no cheating, as you put it) seems best for scanned stuff and 'solo objects', but for a 'game world' (full screen coverage, low local poly count) MLRTA is great. I guess ideally you would want to cast some probe packets to see what would perform best for a particular frame, or perhaps even for a particular tile of pixels. I could imagine that this would improve a situation where a complex model is placed in an architectural model.
About your paper: I would love to see that! If you wish, I can store it on my webserver, so we can conveniently link to it from this forum.
Greets
Jacco.
P.S. How would Sponza with a single light and textures and all perform in your tracer? I'll gladly pass you the speedking crown. [SMILEY Wink]
_________________
--------------------------------------------------------------
Whatever
(L) [2006/09/09] [tbp] [MLRTA revisited] Wayback!Xela, i have that secret passion for fancy charts, perhaps you could help me fulfill it?
Personally i wish there was a bit more attention paied to robustness issues. Even in a careful implementation of a "Down-to-Earth honest traversal of all rays in packet" i'm seeing what i'd call noise, it happens for pretty fundamental reasons so it's hard to eradicate and it certainly doesn't help with performance.
PS: Jacco, why would you waste time probing the scene when you can use temporal coherence.
_________________
May you live in interesting times.
[LINK https://gna.org/projects/radius/ radius] | [LINK http://ompf.org/ ompf] | [LINK http://ompf.org/wiki/ WompfKi]
back