Arauna GI experiments

(L) [2007/03/24] [Phantom] [Arauna GI experiments] Wayback!

I did some tests with various GI approaches over the last week. Here are some results:

[IMG #1 ]

First image: GI, 1 bounce, 512 samples per pixel. Took ages to render.

[IMG #2 ]

Second image: Ambient occlusion, 25 samples per pixel. Faster. [SMILEY Smile]

[IMG #3 ]

Third image: My new wallpaper. This is what you get if you take out the randomness of the hemisphere sampler. I thought it looked cool. [SMILEY Smile]

This week I will try to apply Greg Wards irradiance cache to ambient occlusion; I hope this will make it real-time again.
_________________
--------------------------------------------------------------

Whatever

[IMG #1]:Not scraped: https://web.archive.org/web/20071029233009im_/http://www.bik5.com/sponza_gi.jpg
[IMG #2]:Not scraped: https://web.archive.org/web/20071029233009im_/http://www.bik5.com/sponza_ao.jpg
[IMG #3]:Not scraped: https://web.archive.org/web/20071029233009im_/http://www.bik5.com/sponza_wp.jpg

(L) [2007/03/25] [lycium] [Arauna GI experiments] Wayback!

you wrote "realtime ray tracing" on an image using 25 samples per pixel and many shadow rays? (btw, 25 samples per pixel ought to look reaaaally smooth for normal ray tracing)

btw, since i guess you haven't looked at irradiance caching, it's definitely not something for realtime use since you dynamically construct an octree as you render the image... no idea how you plan to distribute that across your 8 cores, and even when it's built i'd hardly call the resulting scheme "fast" (let alone realtime) unless you're comparing to brute path tracing.

(L) [2007/03/26] [Phantom] [Arauna GI experiments] Wayback!

It's false advertising. I'm doing 25 samples over the hemisphere for each pixel, so it's more like 3 seconds / frame. And that's just for the ambient occlusion; full GI is even slower. It did however produce an interesting image, and I wanted a wallpaper, so I added the ray tracer title screen. [SMILEY Smile]

About this being realtime: I don't see what the problem is with the irradiance cache and the octree? For most pixels I'm just reading the tree, and occasionally I write to it. I think a 10x improvement over naive sampling is possible; for lower resolutions and crude approximations this should give me 10fps on the 8-core.

I did read the papers on irradiance caching (several times in fact); Ward is mentioning temporal coherence, and I was indeed hoping not to construct the whole tree each frame.
_________________
--------------------------------------------------------------

Whatever

(L) [2007/03/26] [Phantom] [Arauna GI experiments] Wayback!

Ben,

I tried the VPL approach, and indeed it gives good results, even with few VPLs. I did not yet try the discontinuity buffer yet though.

I'm worried about two things, and that's why I am also investigating some other options:

1. I need to use quite some VPLs, because my students are working on a game level. Wald is suggesting ways to ignore lights that will not contribute, but that's a complication.

2. In Wald's thesis, interleaved sampling shows severe artifacts on smaller details (well, not so small actually, the round edge of the table in the office scene).

3. The discontinuity filter must be applied to the final image and will benefit little from multiple cores (I encountered this problem when doing HDR filters in software). On the 8-core, this means that 7 cores will be idling.

So I did some quick tests to compare results of 1-bounce GI and ambient occlusion. In my framework, these are easily exchangable, so I could even support both (depending on the platform). Ambient occlusion is close to 1fps on my system (by limiting the size of the hemisphere and the number of samples), so if Ward is right this could be interactive with the irradiance cache (which maps nicely to all sorts of hemisphere sampling, including ambient occlusion).

Another option that I only just encountered is to gather surfaces by doing a kd-tree proximity query using the hemisphere (just like a photon lookup, only with half the sphere); this quickly yields a set of 'interesting' nearby surfaces that could also be used to approximate ambient occlusion.

I'm in exploration mode as you can see. [SMILEY Smile]

- Jacco.
_________________
--------------------------------------------------------------

Whatever

(L) [2007/03/26] [bouliiii] [Arauna GI experiments] Wayback!

Et hop!

with Interleaved Sampling 4x4 + Discontinuity Buffer + 128 VPLs --> about 1 f/s on my coreduo (no MLRTA just SSE SIMD shading and intersections)

The VPLs are found with a Metropolis Sampler but if your scenes are simple (for variance problems I mean), it is just sufficient to use the method by Keller.

You can see that the pictures are not converged --> it is really true for cabin where the small room is not lit (the sampler did not found VPLs which illuminate this room *yet*, more VPLs are needed)

[IMG #1 ]

[IMG #2 ]

[IMG #3 ]

Ben

[IMG #1]:Not scraped: https://web.archive.org/web/20071029233009im_/http://bat710.univ-lyon1.fr/~bsegovia/yacort/screenshots/office_HQ.jpg
[IMG #2]:Not scraped: https://web.archive.org/web/20071029233009im_/http://bat710.univ-lyon1.fr/~bsegovia/yacort/screenshots/direct_mtmh.jpg
[IMG #3]:Not scraped: https://web.archive.org/web/20071029233009im_/http://bat710.univ-lyon1.fr/~bsegovia/yacort/screenshots/cabin_HQ.jpg

(L) [2007/03/26] [lycium] [Arauna GI experiments] Wayback!

you will find irradiance caching can't be threaded (if you somehow do, write a paper for rt07 since you're the keynote speaker!), whereas the awesomely-named discobuffer splits work exactly evenly. if that's not enough motivation on 8 cores, then it will be with more (besides this irradiance caching, especially with gradients to make it not-ugly-as-sin, involves significant mathematics).

btw bouliiii, those are very impressive shots for 1fps! i have to wonder though, if they are perhaps "cherry picked" since 128 VPLs leave a lot of room for big aliasing error. the key statistic is how the framerate varies with virtual light count, which, after some point, i would imagine to be sub-linear; any info in that regard would be really awesome :)

(L) [2007/03/26] [Phantom] [Arauna GI experiments] Wayback!

Just implemented the discontinuity buffer. Lycium, I don't see how you could make it multithreaded? I followed the description from 'Interactive Global Illumination', [LINK http://graphics.cs.uni-sb.de/~wald/Publications/2002_IGI/2002_IGI.pdf,] that states (if I interpret correctly) that the discobuffer is simply a NxN box filter where each point is only included if it's normal & distance matches that of the sample point. I can see how a GPU could do this much better than what I have now though. [SMILEY Smile]

Anyway, I tried it on crude ambient occlusion (no time yet to dive in the other approaches in-depth); the effect is that I get a virtually noise-less image now with just 16 rays (per pixel). Still not interactive, but the 8-core at least renders several frames per second.

I hope to have some time to revisit IGI later this week.
_________________
--------------------------------------------------------------

Whatever

(L) [2007/03/26] [lycium] [Arauna GI experiments] Wayback!

i imagine your N threads generate the to-be-interleaved samples, then you split the image into segments and process them independently. at least i hope this is correct (and don't see why not), i've never implemented discobuffering (damn i love that term).

with irradiance caching you'd have to lock the octree each time a thread wants to update, and especially near the top of the tree in the beginning, that's probably brutal.

(L) [2007/03/27] [bouliiii] [Arauna GI experiments] Wayback!

Hello Jacco,

there is no specific problem for multi-threading, our own implementation is completely multithreaded (damn, I must do this release !!)

The idea is to use deferred shading.

1/ You store normal, pos, dir intersection data (triangle id, instance id and intersection depth) --> easy to multi-thread

SYNCHRO

2/ With the buffers, you compute a discontinuity buffer --> if(pixel == BLACK) no disco otherwise discontinuity

    --> this must be made simply and quickly by checking the previous buffers and making the test between the pixel and its three "upper" neighbors (the right one, the top onje and the right top one)

    --> easy to multi thread

SYNCHRO

3/ Shading

    --> you fill two buffers where you can put the irradiance and the self color. Instead of generating the primary rays, just fetch the data in the previous buffers.

    --> which is a bit more technical is to assign for each ray its pool of light source. The best thing to do is to enumerate the light sources during shading and to use a stride equal to the size of the interleaved pattern for each interleaved ray pack.

stg like

for(i = ray->index; i < light_n; i += pattern_size) {

    make shading stuffs

}

where 0 =<ray>index <pattern_size (each ray inside a given interleaved pattern has a different index)

SYNCHRO

4/you perform a two pass box filter. The idea: you filter horizontally (resp. vertically) until you find a discontinuity. At the same time, you can blend the filtered irradiance and the self color of the objects.

Et hop!

I hope it may help.

Ben

(the multi-thread can be simply done with tiles for each step. For the shading part, you can make stg different by assigning to each thread a pattern. I did not do that because I think it is either too slow (if each thread fill a entire regular grid (I am refering to the original paper by Keller), the final synchro point may be very expensive since you do not control the granularity of the data)) or too complicated (if you assign for each thread one piece of regular grid then you merge everything ) The good thing is I think, to have once again tiles to give to each thread with a size proportional to the pattern size. Otherwise, it is still easy to mask the inactive rays which are outside the tile (it is what I finally did to let the user choose the tile size AND the pattern)

Arauna GI experiments back