Started again back

(L) [2006/02/02] [Phantom] [Started again] Wayback!

I did it. I resumed work on my ray tracer. Basic inverted frustum clipping from the Intel paper is in, EP search is next (but that looked easy, so it might be working tonight). I'm really curious how much it will help the kitchen scene.

BTW we need an archive here with some typical scenes, so others can join in.

As soon as I have the full implementation, dude, you're toast. 4x4 packets are next. [SMILEY Smile]

(L) [2006/02/03] [tbp] [Started again] Wayback!

Punk.

I was peacefully starting to work again on the fundamentals, that is the kd-tree horror... and what do you do? You jump right at the throat.

Unfair.

Anyway, please post more details as you move along.

There's no upload option for that damn board, anyway i bet it's better if it's such a repository is handled by a human operator.

I can't find my partially textured kitchen scene, but i've re-installed some toolchain on my box by now. So it would be nice of you to forward your scenes my way.

I'll throw it all together on that host, there's some space left.

In that damn paper they, yet again, use those damn Erw6/Conference/Soda Hall scenes.

Could you bribe them to get a copy thru your Intel connection?

That way we'll have a straight way to benchmark.

(L) [2006/02/03] [Guest] [Started again] Wayback!

I've begged some on the non-commercial OpenRT mailing list (which is desperatly lacking any traffic).

Just to test the water.

(L) [2006/02/03] [tbp] [Started again] Wayback!

I've stumbled yet again on the conference.nff file on the web, so i've dusted off my silly nff parser and [LINK http://ompf.org/forum/viewtopic.php?t=16&start=0&postdays=0&postorder=asc&highlight= got that].

OpenRT gets 1.93 fps on a 2.5ghz p4, but it seems to have more lights etc...

Anyway i'd really like to have the proper scene handy. We'll see.

(L) [2006/02/03] [Phantom] [Started again] Wayback!

Hm the Intel paper is really hard to implement. I am struggling with the odd algo for building the ibuproficalutation stack, and it's not working yet. Should make some progress today though.

(L) [2006/02/03] [tbp] [Started again] Wayback!

ibuproficalutation, eh?

You've said the inverted clipping was in, but is that already hooked into the rendering path somehow?

I'll see if i can give it a quick try this afternoon.

(L) [2006/02/03] [Phantom] [Started again] Wayback!

The inverse frustum clipping is used to find kdtree nodes that are guaranteed to be outside the frustum, so it's only used in the pre-traversal stage (EP search). Basically, what I do is this:

- Build a frustum: I use four corner rays and the origin.

- Traverse the kd-tree; For each node:

  * Keep track of the node extends (incrementally updating a bounding box by splitting it at the split plane position);

  * Project the four rays on the split plane: this gives four intersection points;

  * See if the bounding rectangle for these points overlaps the aabb of the node (projected on the splitplane);

  * If it doesn't, the frustum does not intersect the node.

- This is used to determine wether only the near or far or both nodes need to be traversed.

- In case of both: The node is pushed on a stack and processing continues with the left branch.

- When a leaf is encountered, a node is popped from the stack, and this node is now the best candidate for the EP. Processing continues with the right branch of this node.

- When the last node is popped from the stack, the current candidate EP is returned.

But this is not working yet. [SMILEY Smile]

(L) [2006/02/03] [tbp] [Started again] Wayback!

Heh, i've said i was gonna give it a try now but i got diverted into fixing my lua bindings (and i'm not yet done, geez).

You got it sorted out yet?

(L) [2006/02/03] [Phantom] [Started again] Wayback!

No I'm working for my boss at the moment. [SMILEY Smile] I did reread the section on EP searching and I think this time I grasped it. Problem with the thing is that it doesn't allow for nice recursive processing, as you need to skip branches in some cases. Besides that, I forgot how I handled empty nodes. I'm not sure if I lable those as 'leaf' or just let them have null pointers instead of siblings. Do you consider an empty node a leaf?

(L) [2006/02/03] [Phantom] [Started again] Wayback!

Hm just checked: Each newly created node is a leaf, and remains a leaf until it is split. So my empty nodes at the end of a path are leafs. Good to know. [SMILEY Smile]

(L) [2006/02/03] [tbp] [Started again] Wayback!

Yep, same approach here. Every node is a leaf unless splitting succeed.

From the tone of your post, i'm not sure (and i don't remember what you told me), but the tree i use for traversal/rendering isn't the tree used while compiling; there's a complete transcription as the final stage.

That allows for my award winning 'triangle-id-compression' feature that is never used (because basically it sucks).

PS: i've printed that paper, but it doesn't help much with those damn colored figures.

(L) [2006/02/03] [Phantom] [Started again] Wayback!

Wait till you get to that 'robust clipping' figure (that isn't robust at all). It's incomprehensible.

(L) [2006/02/03] [tbp] [Started again] Wayback!

I only remember it was that good old slab test or some derivative, at least as described [SMILEY Smile]

Bah, i'll see when i'll get there.

(L) [2006/02/03] [tbp] [Started again] Wayback!

I'm trying to figure out that inverse frustum clipping test for real atm.

(L) [2006/02/03] [Guest] [Started again] Wayback!

Hi guys,

I'm the developer of the Sunflow renderer ([LINK http://sunflow.sf.net/]). I'm glad to see some people are working hard on duplicating the results of the OpenRT/Intel folks =) I have some basic KDTree stuff in my engine, of course its nowhere near realtime though.

Are you planning to release any code at some point ?

Here are a couple things I learned about the MLRTA paper while at siggraph last year. You guys may know this already if you were there, but other people reading this forum might still be interested in hearing this. This is from memory only, so forgive me for any inexactitudes.

I spoke to Ingo Wald and asked him about the performance numbers in the MLRTA paper and what he thought of the algorithm. Basically he said that all that fancy clipping and searching is just a way to let you start your traversal further down the tree. This is not a new idea, and Vlastimil Havran had suggested this before either in his thesis or a subsequent paper. The OpenRT guys actually did implement this technique and they say it gives around a 10-15% improvement in performance. Ingo continued to say that the Intel folks got most of their performance from insanely optimized low level coding. They had a bunch of russian hackers working on this for a year, spending all their time counting cycles and reordering sse code by hand. End result: their inner traversal loop is something like 5 cycles per node whereas the OpenRT one is 15-20 cycles. He pointed out that a single sqrt per pixel in a loop over the image gets you something like 40fps. They claim 100fps ray tracing the soda hall model on the same hardware.

This was further confirmed during the paper presentation where the authors basically apologized for misrepresenting Vlastimil's previous work who had in fact already implemented a very similar (if not the same) technique. After the presentation Vlastimil commented that while the actual algorithm they were presenting was nothing new, the amount of optimization they had achieved was simply amazing and truly trend setting for research in this field.

Anyway, I hope I remembered all that right. If someone else is more informed, feel free to clarify what I've just said.

Looking forward to seeing more RTRT stuff from you guys!

btw: are your tree compilers O(N log N) ? Mine is O(N log N log N) and its way too slow. I'm planning to go with something like [LINK http://www.cgg.cvut.cz/~havran/DISSVH/kdtreeconstruction.txt this] for my rewrite. Any extra thoughts?

(L) [2006/02/03] [tbp] [Started again] Wayback!

Hello guest [SMILEY Wink]

Historically speaking, we _were_ working hard duplicating Wald/havran's work a year ago. Both our projects went dormant.

Somehow that MLRTA paper revived our friendly competition.

Personnaly, my plan was to GPL the code at some point but because of misplaced pride and code too horrible to see the light, err net, i never did (and that's a shame because i've cried for clues sometimes).

Jacco & i have exchanged some code & ideas along the way, and making that process more visible in an open fashion (via that forum among other things) was a first step to get things rolling.

So i'd dare to say yes.

I cannot thank you enough for sharing that siggraph gossip.

It's gonna save us much desperation & assorted headaches now that we know what to really expect.

But that kind of low level hackery doesn't surprise me, after all they're sponsored by Intel.

Btw i've just tested and i get 40fps (ok, 39.28fps / 41M ray/s, 1024x1024) when pointing the camera outside the kd-tree bbox [SMILEY Smile]

So basically their tricky culling gives about the same improvment as what Jacco had when he implemented the longest common traversal thing (Havran), modulo the fact that it may be easier to optimize for a given platform.

Well i guess we'll see what it gives.

I remember checking your sunflow project, but that was way back then and it has obviously grown up.

I'm going to correct that.

Now for the tree compiler... we (jacco & i) have some divergences, but our implementations are fundamentally equivalent.

I don't think i could charaterize mine as o(n log n) or o(n log n log n) because there's so much variation depending on the geometry/topology. And then there's the quality of the output.

We both have a full SAH going, i have some additionnal heuristics.

I can sometimes compile a complex 275k tri model in 25s and then there that odd scene with 3 triangles duelling in a corner that takes ages.

I think i'm at the point where it's reasonably fast and spits a good enough tree, yet another complete rewrital is due... so who knows.

The thing is, that kind of code is very branchy and i got a ~2x speedup just by carefully auditing paths and hinting them properly for the compiler.

Also i think that Jacco's idea for speeding up the scoring (i think he's described it somewhere in an article) is the Right Thing.

The MRLTA hints once again at bubbling up void cells, something Havran tried. In my experience that wasn't a win-win, but with larger grained culling it sounds like a smart move.

My memory is a bit fuzzy, but i'd be happy to discuss it some more.

Oh and if you have a neat idea about how to visualize that damn thing, or how to solidly analyse/audit it... clue me.

Anyway, excuse the lack of coherence of this post... i'll just pretend i was tired.

PS: got a timeout on Havran's note, will retry later.

PS²: just remembered that, if you profile your compiler you'll notice that obviously most time is spent near the root and roughly picking up a split in there wouldn't hurt performance (if done with some care) while tremendously speeding up the whole thing.

(L) [2006/02/04] [Phantom] [Started again] Wayback!

I would like to comment on that 'Siggraph gossip':

A while back, I wrote an article together with a guy from Intel, and we discussed OpenRT briefly. Basically, the 'intel guy' told me that they spoke to Wald earlier, but that he was a bit hard to work with (attitude). Tbp and I both tried to implement the ideas in his thesis, and we both ran into that same attitude: The whole thesis seems detailed and describes 'amazing achievements', yet it manages to skip crucial details and clear performance figures. We managed to get past that, and with relatively little effert (6 months spare time) we both beat Ingo Wald's performance, almost by a factor two. So you don't need autistic Russians for that. [SMILEY Smile] OpenRT is just not very well optimized.

So I am not suprised that Ingo Wald spoke to you about these Intel guys with a certain attitude. Apparently, things are not really working out between Wald and other scientists, which is a shame. After all, Wald did some amazing work: No scientist would ever go into hardcore platform-specific optimizations, but Wald did, and showed that it matters big time. He should be proud that others are taking this even further.

At this point, I dare say that I understand the Intel paper on EP search pretty much fully, and it's not like Havran's method at all. It's much simpler, and it's far easier to see that it helps. Also, overhead is minimal, so even in the worst case scenario, you're loosing very little performance. They claim 40% improvement on the 7M dragon, I consider that spectacular. And they claim 200-250% on typical indoor scenes, and it's easy to see why. This algorithm makes a renderer dependant of local complexity, not global data size. This is very significant (I should know, I'm a long time 3D engine coder) and no-one did this before.

Granted, the paper lacks in several areas (they are not very good tutors, and they didn't get all their facts right) but they did a great job - But then they ran into Wald. Wald never answered my e-mails by the way, I tried to contact him several times, sharing my experiences and asking questions.

So in short: I believe there's a personality problem in Germany. I'm not going to dismiss the Intel paper because Wald says it's unoriginal and irrelevant. On the contrary, I would say. Walds interest is an indication that it is very significant in fact.

We should really invite the intel guys to this forum btw. [SMILEY Smile]

- Jacco.

(L) [2006/02/04] [tbp] [Started again] Wayback!

Jacco,

you're right on both accounts.

I clearly remember cursing Wald for all those hmm eluded evil details in his thesis.

And MRLTA shouldn't tax performance too much even in the worst case (and like i've said, it lends to a to-the-metal approach).

I've sent an email to this guy, [LINK http://graphics.stanford.edu/~yoel/notes/] , trying to lure him in there... to no avail.

Now you're the guy with Intel contacts, so.... [SMILEY Wink]

PS: you haven't answered my post about clipping

PPS: woot, seems that they fixed the gomp bug related to c++...

PPPS: but i've yet again stumbled on a codegen bug, not really a showstopper, but for sure some case of creative codegen.

(L) [2006/02/04] [Phantom] [Started again] Wayback!

Clipping:

I am assuming you need to test two axii, so if the split plane is x = n, you test your intersection points against the y and z range (i.e., two slabs).

I almost got it working, just a few false negatives. Not sure what I'm missing yet.

(L) [2006/02/04] [Phantom] [Started again] Wayback!

Looks like I got something working. I'm in the middle of social stuff, so no time to report properly. But it seems to be working. [SMILEY Smile] With double axis tests, I might add.

(L) [2006/02/06] [Phantom] [Started again] Wayback!

OK, some initial numbers.

First of all: My current implementation is VERY basic. I was already rendering 16x16 tiles, so I am now casting beams that contain these 16x16 rays. So I am not yet using large tiles, let alone hierarchically subdivided tiles. My MLRTA implementation uses no SIMD code, and I am not yet doing the alternative culling test for leafs. With this setup, I am observing the following:

- Kitchen scene: The tracer does not find any entry point straigt in a leaf, which is something I expected at least for some tiles. Odd. Nevertheless, I see a 5% speed improvement (overall) despite the simplicity of the current implementation. I made some fixes to my kd-tree compiler (bad empty space cut of fixed), which also results in 10% improvement (on top of the 5% for MLTRA).

- Legocar: MLRTA correctly detects 'empty' tiles and skips rendering completely for those. However, even in scenes where these tiles contribute to 80% of the image (very small car in a huge window) I see only a 100% speed increase, which could mean that the poor MLRTA implementation is holding back performance significantly (which is good news, actually).

I will post my current implementation later today, as it will help others to implement it more quickly. It's quite hard to get right initially, IMO.

- Jacco.

(L) [2006/02/06] [tbp] [Started again] Wayback!

Ah! That's interesting.

But if i remember you didn't have a fast packet vs bbox rejectal at the start of your traversal?

Because when visualizing a little model lost in space, it helps a lot.

When the coverage reach 100%, it's useless of course.

So, that would be why you gained much from mlrta in that case. I guess.

Keep it going [SMILEY Smile]

(L) [2006/02/06] [craouette] [Started again] Wayback!

had anybody try this compiler:

[LINK http://www.codeplay.com/]

?

(L) [2006/02/06] [Phantom] [Started again] Wayback!

Yeah I did, in an early stage of development of my tracer. It sucked. I contacted them about the sucking, and they where very anxious to help, but frankly they where lagging too far behind to make it worth the trouble.

(L) [2006/02/06] [Phantom] [Started again] Wayback!

Hm, looks like MLRTA is cutting off only the first one or two traversal steps, and that's in the kitchen scene, while looking at a table. I must be doing something wrong.

(L) [2006/02/06] [tbp] [Started again] Wayback!

Doesn't sound too helpful indeed.

But then that kitchen is a bit of a pain.

Do you get better results by changing the pov?

PS: i don't have that kitchen scene, i thought you have a link to it on your site but can't find it.

(L) [2006/02/06] [Phantom] [Started again] Wayback!

OK, reimplemented MLRTA so it better matches the algo in the paper. Found the big huge catch:

If you find a leaf with triangles, you may start at that leaf if:

1. The frustum does not intersect any other leaf with triangles (not even behind that node); or

2. The entire frustum is occluded by the triangles in that leaf.

Obviously, case 2 is much more common (think walls), but way harder to test. However, if you get a positive on two, no traversal AT ALL is needed.

That's the true meaning of that little sentence in step 3 of the algorithm: You may use a leaf as a starting point if that leaf is contained in some watertight volume that you somehow defined (no word in the paper on how to do that), or if the geometry in that leaf completely obstructs the frustum.

I can do that. Back to the keyboard. [SMILEY Smile]

(L) [2006/02/06] [UlfJack] [Started again] Wayback!

Hi there.

I saw tbps post on the OpenRT mailing list and after a mail from me and then from him and from me again, I ended up here. Saarbrücken (Wald, Schmittler and others) is a sad story.

Anyway. I've got a realtime raytracer myself and it's open source and sitting at:

[LINK http://www.yvert.de/]

tbp already tried it out and got it to compile on his x86_64 machine. If you try it and can't get it to work, please let me know. My raytracer has it's own yet another file format and can read only that, but there's a java previewer and converter as well.

Right now, we (me and a friend) are working on getting fully programmable shading with Cg in, which should work by the end of the week (ok, make that end of next week).

Cheers, -- Ulf

(L) [2006/02/07] [tbp] [Started again] Wayback!

I was thinking that better than struggling with that annoying kitchen you could try your cloister scene.

From what i grok from the output, it might be a good (well, better) candidate for mlrta, easier to get right and still a kind of interior env.

(L) [2006/02/07] [Phantom] [Started again] Wayback!

I did some extensive testing this morning to see why MLRTA is only saving one or two nodes at best for each 16x16 tile (moving to 8x8 all over the place doesn't help btw, still only 1 or 2). I visualized the intersecting points and traversed nodes for a single frustum, and it all seems to be OK. Yet, it doesn't nearly give me the results our Russian friends are claiming (1/3rd saving on the 7M dragon)... I am already splitting of empty space at the very beginning (in fact, every empty space that is larger than 10% of a full node gets clipped of before I do anything else), so that should match their requirements. No idea what's going wrong.

During my train trip I added a test for full occlusion, I still have to test it. That could help, but of course it makes the tests even more expensive. Besides, locating a leaf that obstructs the entire frustum does not mean you're done: You still need to use the bifurcation node that led to that leaf as candidate, and you still need to do a full trace on that leaf (so prevent ordering problems). So this test will help specific geometry, but it will hurt more generic geometry.

O well hopefully we will get some hints soon.

back