feb 20, new compiler shots back

(L) [2006/02/20] [Phantom] [feb 20, new compiler shots] Wayback!

OK, my kd-tree compiler is now producing stable trees (no gaps and other misery), that perform better than my previous compiler. So I would like to finally show some images:



Sponza in all it's glory. The odd blur is a fake hdr blur effect that I once implemented.


[IMG #1 ]



Sponza at top speed. At 512x384, Sponza now runs at 4.0-4.8fps.


[IMG #2 ]



The tres cliche kd-vision. [SMILEY Smile]


[IMG #3 ]


Sadly MLRTA is still not performing well (barely covering for it's own cost). I'm going to investigate that tomorrow.[/img]
_________________
--------------------------------------------------------------

Whatever
[IMG #1]:Not scraped: https://web.archive.org/web/20061004012928im_/http://www.bik5.com/gkvhouten/images/frontpage/sponza-hq.png
[IMG #2]:Not scraped: https://web.archive.org/web/20061004012928im_/http://www.bik5.com/gkvhouten/images/frontpage/sponza-lq.png
[IMG #3]:Not scraped: https://web.archive.org/web/20061004012928im_/http://www.bik5.com/gkvhouten/images/frontpage/sponza-kd.png
(L) [2006/02/21] [tbp] [feb 20, new compiler shots] Wayback!

Grats.


I just wish you'd have us enlighted with some details, eh.


I haven't re-coded that yet (waving hands mode activated, but i suspect that taking out empty spaces before scoring (as opposed to integrating that in the scoring process) is only a win for MLRTA. What's your take on the topic?


Anyway it would be nice if we could exchange binary tree dumps this time.


PS: take note that i won't abuse my mad administrative priviledges unlike others, but that will be 'très cliché' for you.
(L) [2006/02/21] [Phantom] [feb 20, new compiler shots] Wayback!

OK, some details then:


- I am building a straightforward SAH kd-tree, nothing special: No stradling primitive penalty, virtually no limit on tree depth (max is 60 atm, dunno if that's ever reached), virtually no limit on minimal prim count per leaf (set to 1), so subdivision continues till SAH thinks it's better to stop.

- Before SAH kicks in, I first check all six sides of the current node for empty space. If empty space exceeds 33% along any axis, I cut off the largest chunk (in terms of surface area, not volume, makes sense if you give it some thought). Doing it this way instead of via scoring is just cheaper, as these splits require virtually no processing.

- I am keeping track of fully sorted lists now; not just every node position is guaranteed to be equal or smaller than the next, it's also guaranteed that maxima come before minima in the list.

- Using this setup I get trees that are typically 40% larger than my first compiler; ~1Mb for the legocar, ~6Mb for Sponza.

- I get an average of 5 intersections per ray in Sponza, 3.1 in legocar, but that's probably because of some empty screen space.

- My tracer doesn't handle flat cells, so empty space is cut off only if the resulting cell is at least 5 * EPSILON wide. I just realize that in my code, flat cells won't have their empty space cut off at all; need to fix this so it generates cells with minimal dimensions in that case. Should be a small win.

- There is also a small potential win in the 'cherry picking': Right now, primitives that end at the optimal split pos go to the left branch, primitives that start there go to the right branch and primitives completely on the split plane are used to balance (so either left or right). This is wrong; these primitives should be used to match the optimal leftcount/rightcount as determined by SAH (this could mean a different optimal balance than 50/50).

- Sponza is very fast when the camera is looking down because that results in very few traversal steps (see demo).

- MLTRA doesn't help atm, it doesn't ever start at the root node but it never starts deep either, so it just covers it's own overhead, nothing more.

- My subsampling is badly optimized, 10% subsamples overall result in 50% frame rate drop. Since packet traversal gives a 2.5 speed boost compared to single ray tracing, I would expect a 25% speed drop max, obviously I'm missing something.


I think that's pretty much all there's to say. If you need other info, let me know.


I uploaded a demo of Sponza: [LINK http://www.bik5.com/rtNexGen.rar] , 16Mb. There's a file in this package: scene.txt that specifies which scene is used, default is Sponza. See the first line of the file; valid alternatives are the office scene, the legocar and perhaps some others. Please let me know how Sponza performs on your machine.


If anyone is interested, I could post my full kd-tree compiler source code. I am considering releasing the complete source code for my ray tracer, but right now I wouldn't have a problem at all sharing the compiler. Just let me know if you're interested.


- Jacco.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/02/21] [Phantom] [feb 20, new compiler shots] Wayback!

These shots have been submitted for the 'image of the day' gallery at DevMaster.net. Perhaps it will attract some extra minds. [SMILEY Smile]
_________________
--------------------------------------------------------------

Whatever
(L) [2006/02/21] [tbp] [feb 20, new compiler shots] Wayback!

Is there a way to disable AA?

I'm not too sure you have any AA going on with the lo-fi version, rtNexGen_512_lq, but it's getting in the way of reproduceable results: i've tried to match your rendering using your sponza .obj and your scene.txt (one light + camera pos) but your damn camera doesn't start there and keeps moving.


So i've taken some screenshot and tried to mimic the view by hand. Not very scientific...

On my machine, on xp, you're running at ~7.0 fps in the bottom part. In the frozen view i've picked you're at 7.0 fps, 2553.47K ray/s and 5 intersection/ray, with my best match (which is certainly not perfect) i get: 8.37fps, 3290Kray/s, 3.4 intersection/ray.
(L) [2006/02/21] [Phantom] [feb 20, new compiler shots] Wayback!

It always says 'aa', sorry. The lq version does not use aa at all.

About the camera: I will compile a version without camera movement, that actually looks from the specified coord to the specified coord. This one does that also more or less, however it circles around the 'start' coord (on a circle with the specified radius) and increases it's y by 0.05 per frame.


I have submitted the sponza rendering to the original forum where we got the model from. [SMILEY Smile]
_________________
--------------------------------------------------------------

Whatever
(L) [2006/02/21] [Phantom] [feb 20, new compiler shots] Wayback!

[LINK http://www.bik5.com/rtNexGen.exe] , this is a version that renders at 800x600, no aa, from the camera location specified in the scene.txt file to the target specified in the file. Renders in 0.450s on my side.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/02/21] [tbp] [feb 20, new compiler shots] Wayback!

Thanks, that will help.


Ok, so you don't have AA in the lo-fi version and we're on par feature wise.

Let's pretend i've matched the view, it doesn't make much difference in that part of the mesh anyway either for you or me, and lighting etc... i'm running between 15 and 20% faster as is.

There is some clues, from the intersection/ray stat, that it's related to the kd-tree. Another step to confirm that would be that for you to display some node traversed per ray stats (for ray that hit the scene that is). That would be really handy.

Of course it would be simpler if we could trade those trees [SMILEY Wink]


What does your binary dump look like? (geez you could have kept the tree, triangle ids & shading data separated).


PS: That was all done with msvc8 builds as i haven't back-backported changes to gcc/nix.

PPS: Ah you've posted in my back. Dling.
(L) [2006/02/21] [tbp] [feb 20, new compiler shots] Wayback!

rtNexGen 800x600: 0.3125sec, 2866.66K ray/s, 4.9 int/ray.


Ah forgot to ask, what's your horizontal fov like? (camera radius tells me nothing [SMILEY stick out tongue] )


... trying to match it.


edit: better match this time, beside the precise fov issue


quadrille: 0.2601sec (3.85fps), 3697.4K ray/s, 3.1 int/ray, 38.2 nodes/ray (<-- heh, that's with the old compiler).


Sadly i can't run it against the new compiler for rendering as that mesh is one of those that exhibits a remaining bug, where bits of triangles disapear.
(L) [2006/02/21] [Phantom] [feb 20, new compiler shots] Wayback!

Are you using your 64-bit version? That would explain the 15-20% speed difference.


About the tree: It's a really simple format, you should be able to read it. Here's the tree loading code:
(L) [2006/02/21] [Phantom] [feb 20, new compiler shots] Wayback!

Perhaps you can send me a dump of your kd-tree, so I can test it here? Plus a description of the file, of course. I suppose you use more or less the same data.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/02/21] [Phantom] [feb 20, new compiler shots] Wayback!

The worst you've got to offer uh? I noticed that the intel compiler performs roughly 10% better than vs2005, so I've got some work to do, obviously. It would be nice though if I could at least determine where the problem is: kd-tree or renderer. I suspect my renderer is not very optimal (outside pure traversal), the kd-tree should be pretty good.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/02/21] [Phantom] [feb 20, new compiler shots] Wayback!

In my setup, the TriAccel structure points to the original primitive data, and that's where I get my material from. Here's the TriAccel:
(L) [2006/02/21] [tbp] [feb 20, new compiler shots] Wayback!

I remember we've discussed that point.

Your stuff-it-all-in-the-same-basket is certainly a win on data set small enough, or if the shading part is accessed enough etc. And you save some pointer arithmetic.

On the other hand if you keep things separated you have better chances to keep the cache filled with relevant data once your data set is too large to be cached. Plus you don't always access shading data (shadow rays etc).


Here's my triangle acceleration structure. There's some waste, and i remember telling to myself to use those freaking 8 tail bytes for something. At least it's nicely aligned and ordered by access within the intersection code.

(comments suffered from code rot)

edit: Ok, i'm on crack. Triangle's id really comes from that structure, so there's only 4 bytes wasted (well i guess one could compress the axis too).
(L) [2006/02/21] [Phantom] [feb 20, new compiler shots] Wayback!

DevMaster IOTD is up. These guys are really starving for new stuff apparently. [SMILEY Smile]
_________________
--------------------------------------------------------------

Whatever
(L) [2006/02/21] [Phantom] [feb 20, new compiler shots] Wayback!

Good point, moved to TODO list.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/02/21] [Lynx] [feb 20, new compiler shots] Wayback!

meh...needink computerr with SSE2 for runnink program, da?

will have to try my notebook then...one Banias instead of two Palominos...heh...


Oh and i'd like to peek into the code too...but will not have time until friday anyway.
(L) [2006/02/22] [Phantom] [feb 20, new compiler shots] Wayback!

I implemented the various ideas that popped up yesterday:


- Empty space cut-off now properly grows flat cells instead of ignoring them;

- TriAccel structure size decreased from 64bytes to 48 bytes by moving primitive data to separate structure;

- Some other things I forgot.


Got a 4% speed boost.


Some things I noticed:

- MLRTA is flawed. It's showing intersection points outside the frustum, so that's definitely not good. It's also still finding filled leafs where it shouldn't be.

- Currently I calculate diffuse and specular for each and every ray. As I never put lights close to primitives so far, this is probably overkill. I would like to test what happens when I do these calculations for one ray of the packet and use the result for all four rays.


But I'm still lagging behind, obviously.


Also the crowd is screaming for multiproc support... Need to work on that too. [SMILEY Smile]
_________________
--------------------------------------------------------------

Whatever
(L) [2006/02/22] [tbp] [feb 20, new compiler shots] Wayback!

I've made a post where you'll find a dump from my tree (old compiler) for sponza_clean, as requested.
(L) [2006/02/22] [Phantom] [feb 20, new compiler shots] Wayback!

Noticed, I'll give it a try.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/02/22] [tbp] [feb 20, new compiler shots] Wayback!

If you get it going, i could dump some more. Or you could dump some of yours [SMILEY Wink]
(L) [2006/02/23] [tbp] [feb 20, new compiler shots] Wayback!

Same comparison as previously, but not rushed. Made sure to boost both process and seclude them on their own cpu.

I had to nudge the cam a bit while trying to match fov.

Click.

[IMG #1 ]


edit: Was starring at it when i realised we don't map textures the same way.

edit²: <further starring later> One of us has troubles with shadows. The light seems to be more or less at the same place and most shadows are similar, but on pillars on the lowest level for example. And as much as i think you got texture mappage right, i think your shadows (or lack of) aren't.
[IMG #1]:Not scraped: https://web.archive.org/web/20061004012928im_/http://ompf.org/ray/wip/pix/20060223-sidebyside-small.jpg
(L) [2006/02/23] [Phantom] [feb 20, new compiler shots] Wayback!

That's my 'shadow hack' that's not properly configured. I offset shadow rays by a small amount to avoid traversing the kd-tree near the intersection point twice. For the pilars, this is not such a good idea.
_________________
--------------------------------------------------------------

Whatever

back