(L) [2007/08/28] [Stereo] [My tracer is going realtime ...] Wayback!Hi guys,
I've been following your discussions and threads in this forum for quite some time now and learned quite a lot. To end my presence as a lurker I'd like to show you some of the work I've done in the last few months.
As my final year of the Master's programme at the Johannes Kepler University of Linz is about to begin, I had to decide on a topic for my diploma thesis. Having already spent time with implementing a raytracing for my Bachelor thesis, I wanted to build upon that knowledge and do it realtime.
In fact I was actually experimenting with realtime raytracing last summer, when I played with a new piece of shader model 3.0 hardware to assist the cpu in tracing rays. The idea was to use the cpu as an evaluator for complex shaders, that could spawn new rays, which would in turn be processed by the gpu.
A big problem with that approach is that it only makes sense to send large batches of rays to the gpu for this system to be efficent. On the other hand tracing results wouldn't be available immediately.
I developed a solution to this problem, which I presented at RT06 during the poster session: It involves the use of multiple user level threads, which are commonly called coroutines or fibers. Basically each fiber renders a single pixel. Whenever a request for tracing rays is issued, the rays are inserted into a queue and control is handed over to the next fiber. This way rays are collected to form larger batches for gpu processing without affecting the way you program surface shaders - they only need to take care of a single pixel, parallelism is hidden.
The big problem with gpu raytracing is the retrieval of results, which was the bottleneck of my application and limited the speed to about 900 kRays/s for a sphereflake comprised of about 7000 spheres. Therefore I've decided to drop gpu-support for my diploma thesis and concentrate on a cpu-only solution.
Quick overview of the current state:
- O(n log n) kd-tree builder à la Wald.
- Kensler-Shirley ray-triangle intersection test.
- mono tracer + optimized simd tracer for variable packet sizes (multiples of four).
- ray sorting and batching based on my fiber architecture - more about that to come!
- I've also support for BVHs but they're constantly outperformed by the kd-tree implementation.
- multi-threaded implementation
Due to the fact that the fiber architecture turned out so good, I kept it in my cpu-only tracer. It is now used for sorting rays based on the signs of their direction vectors. When the active fiber posts rays for tracing, they're inserted in some sort of flat octree based on the direction signs. If a leaf exceeds the critical number of rays to form a packet, it is immediately processed by the simd tracer. In case not all posted rays could be traced in this step, control is handed over to the next fiber. Especially at the end of a frame, when there are only fibers waiting for their results, the mono-tracer is used to take care of the rest rays.
Using this system I do not need to resort to masking out rays in packets for the kd-tree simd-tracer - packets are only built when there are enough rays to form a packet with identical direction signs. In the future I'll try some more techniques for sorting and batching to increase the coherence of packets.
Now for some screenshots, measured rates and code snippets - if you're still with me, sorry for the long post!!
Snipped from the queuing function: