Memless RT

(L) [2012/04/25] [ost by stefan] [Memless RT] Wayback!

Is it just me or is this (and [LINK http://www.uni-ulm.de/in/mi/forschung/graphics/rayes.html Rayes]) somehow moving towards stochastic rasterization, except in world space as opposed to camera space? Where REYES is sorting primitives into screen space buckets before doing 2D point-in-triangle tests for each pixel, this, if I understand it correctly, is sorting primitives into 3D bounding boxes before doing 3D ray/triangle tests. Together with [LINK http://www.kunzhou.net/2010/mptracing.pdf micro polygon ray tracing], I can see this all moving towards a generalized method in the middle.

Anyhow, I wonder if this would allow for some nice on-demand tessellations.

(L) [2012/04/26] [ost by DTRendering] [Memless RT] Wayback!

>> davepermen wrote:I really like that approach (always like to be 100% dynamic on the fly). Hope they'll soon find a proper solution to scale with multiple cores. that's quite limiting right now. then again, core count seems to stagnate currently. still, they need to be fed
once that's found, it'll be great.
Well, I am not sure that there is a real scaling problem. You may want to have a look at the original paper ([LINK http://dl.acm.org/citation.cfm?id=2019636]) to get more details on the algorithm, results, and the scaling according to cores multicore (~3.5x with 4 threads ).
This TOG paper will be presented at SIGGRAPH 2012, so stop by if you are around and want to know more.
The EG paper has some interest, but by swapping rays instead of just swapping indices, the authors end up with poor scaling due to some bandwidth constraints.
>> Don't forget also that this doesn't pay off if you need to trace many rays through the scene. Also, requires to trace rays in batches, so can pose some restrictions on the system, like integrator interruption and resuming, etc.
Yep, large batches are needed so it depends on your circumstances. I am not sure why "this doesn't pay off if you need to trace many rays through the scene" though. [SMILEY ;-)]
(!advertisement! [SMILEY ;-)]) you can get your own opinion by using the library available at [LINK http://www.directtrace.org/ www.directtrace.org] .Some kind of update to the lib is now overdue though. If you want a quick overview of the programming paradigm for the lib, there is also the HPG'11 poster available at:
[LINK http://www.highperformancegraphics.org/previous/www_2011/media/Posters/HPG2011_Posters_Mora_abstract.pdf]
>> maybe within a few years it can be made competitive with more traditional approaches.
It depends what we mean by competitive. I am pretty sure that obtaining several millions of purely random rays per second (not ambient occlusion rays for instance) is quite competitive, and do not forget that a prior construction step is not needed. Maybe I am wrong, but I would expect that in many cases you can get 75% to 95% of the performances of a state-of-the-art ray-tracer with such an approach (again, I refer you to the TOG paper results).

Finally, what Toxie said previously is right, things like tessellation is fun and possibly easier with such a paradigm.

(L) [2012/04/27] [ost by ingenious] [Memless RT] Wayback!

>> DTRendering wrote:Don't forget also that this doesn't pay off if you need to trace many rays through the scene. Also, requires to trace rays in batches, so can pose some restrictions on the system, like integrator interruption and resuming, etc.
Yep, large batches are needed so it depends on your circumstances. I am not sure why "this doesn't pay off if you need to trace many rays through the scene" though.
Well, because every batch intersection operation effectively builds a new acceleration structure, which is then thrown away. For multi-bounce global illumination you need to perform many such iterations. Actually, you could cache the built structure and maybe even refine it over iterations. That's an interesting direction to investigate. But with caching the parallelization issue should become worse.

(L) [2012/04/27] [ost by DTRendering] [Memless RT] Wayback!

>> because every batch intersection operation effectively builds a new acceleration structure
I would say that it actually builds parts of a new acceleration structure, with subtle differences as well (I'll emphasize that in the SIGGRAPH talk). This is quite important as the building percentage is not so high. So it may require several batches (2 or 4 or 10 or 100?) before precomputing a spatial subdivision data structure is a real advantage. I would estimate it between 4 and 8 batches, but it really depends on the circumstances.
The algorithm itself can also solve problems in other areas. For instance, if you are using ray-tracing to do collision detections only, then only one batch is needed.
The EG short paper compares their results with precomputed data-structures, but did not discuss the construction times of their data-structures either. It could be interesting if the authors could post informal results here.

Finally, you do not throw away your results. At the end of your batch processing, you end up with a list of shuffled triangles, and clearly there is some sort of coherence in the order of appearance [SMILEY :)] . Anyone interested in writing a paper on that?

(L) [2012/04/27] [ost by voxelium] [Memless RT] Wayback!

>> DTRendering wrote:Well, I am not sure that there is a real scaling problem. You may want to have a look at the original paper ([LINK http://dl.acm.org/citation.cfm?id=2019636]) to get more details on the algorithm, results, and the scaling according to cores multicore (~3.5x with 4 threads ).
Unfortunately, there is definitely a scaling problem for incoherent rays. The scaling strongly depends on the scene and the CPU architecture. For example, I've achieved almost the same scaling as you did for the Conference scene on a very similar CPU (Bloomfield), but it can be worse with Sandy Bridge or other scenes. That's why I did tests using both CPUs.
>> DTRendering wrote:The EG paper has some interest, but by swapping rays instead of just swapping indices, the authors end up with poor scaling due to some bandwidth constraints.
Swapping indices works well for coherent rays, but performs consistently worse for incoherent rays because of the poor cache utilization.

Memless RT back

Board: Home Board index Raytracing Links & Papers