Arauna BVH source code back
(L) [2007/08/16] [Phantom] [Arauna BVH source code] Wayback!I have uploaded the source code for my first crude BVH implementation (based on Wald et al.'s paper) here:
[LINK http://www.zshare.net/download/3161989689f724/]
This version has shading (the usual: bilerped textures, normal mapping, point light source) but not yet support for multiple lights, floorfog, reflections and refractions. There are also some artifacts that I am aware of; these are the typical corner cases and should only occur in the shadow tracer (there is no fall-back code when a 'major axis' cannot be determined, this happens for very divergent shadow ray packets). The code is not optimized either; most (new) code is still plain C without proper SSE. I believe this is actually a good thing at this stage, and that's why I am releasing the source code. It shows how to setup planes for a packet of rays, how to setup the corner rays and how to trace a BVH using this data.
There's still a lot to do: Refitting the BVH is not in yet, and I would like to try out Reshetov's latest ideas... But time is sparse, as usual.
About the configurations: This code is for MSVC2005, with support for the latest intel compiler. You are supposed to build and run the debug configuration using the MS compiler and debugger, the release config is for Intel10.x. Draft is for Intel 9.1.x (don't ask) but it compiles very quickly, which is good.
- Jacco.
EDIT: Note that I lost quite some performance by putting back the shading pipeline. Shading is now definitely the bottleneck. I still get a solid 30% or so performance increase over my best kd-tree attempt (even though that's still slower than sngan), but of course I'm now rendering a far more 'dynamic' data structure, so in my opinion, the gains are far more than just that 30%...
_________________
--------------------------------------------------------------
Whatever
(L) [2007/08/16] [davepermen] [Arauna BVH source code] Wayback!downloading.. will for sure be a good read.. [SMILEY Smile]
(L) [2007/08/16] [Shadow007] [Arauna BVH source code] Wayback!Thanks for a new sharing !)
As you know I'm really more a lurker han an implementer, but I'm for sure really interested !
(L) [2007/08/16] [dr_eck] [Arauna BVH source code] Wayback!Jacco,
  In looking over your common math code, I see no use of intrinsics.  This spurred me to take another look at tbp's Radius code; he uses intrinsics.  I thought Xela also used them in his papers.  Can someone help me understand when to use intrinsics?  How important are they?
_________________
Opinions? Those are *facts* son.
(L) [2007/08/16] [Phantom] [Arauna BVH source code] Wayback!I don't use them for vectors, only in situations where I can have four independent streams of data. I do not consider x,y,z,w independent streams, so I use regular code. Look in core.cpp for extensive use of intrinsics (especially in the shading pipeline).
_________________
--------------------------------------------------------------
Whatever
(L) [2007/10/03] [Shadow007] [Arauna BVH source code] Wayback!Again looking at the code, I came upon the Shading part.
It seems to me that the code is as follows :
(L) [2007/10/03] [Phantom] [Arauna BVH source code] Wayback!Possibly. I have been looking into this very recently, but there are some architectural problems:
If four rays hit the same primitive, colors are stored as quads of components, i.e. RRRR, GGGG, BBBB. For the 'mono ray shading case', I store colors directly per ray, i.e. ARGB. So this extra test would need to be extended to fetching the texel as well. Given the amount of pixels that get shaded using this code (especially at lower resolutions) this might be worthwhile though.
_________________
--------------------------------------------------------------
Whatever
(L) [2007/10/03] [Phantom] [Arauna BVH source code] Wayback!I have tried to convert internal texture storage of Arauna from 128bit to 32bit. The aim was to reduce bandwidth, hopefully resulting in better performance (fewer cache misses) and better scalability on octopussy. Well I didn't test the scalability yet, but performance didn't improve... I'm close to break-even though. This probably has to do with the rather expensive conversion from 4 x 32bit ARGB to four quads with AAAA, RRRR, GGGG and BBBB. This is the best conversion code I could come up with:
(L) [2007/10/03] [Phantom] [Arauna BVH source code] Wayback!I managed to hide the cost of the extra instructions almost completely. The 32bit code is now just as fast as the original code (though not any faster, as I hoped). I'll try it on Octopussy tomorrow...
_________________
--------------------------------------------------------------
Whatever
(L) [2007/10/03] [Phantom] [Arauna BVH source code] Wayback!You mean that four rays frequently hit the same texel? That should be only the case when the texture is (severely) oversampled, right? I suppose this would make trilinear sampling pretty cheap (as this will definitely happen in the next MIPmap level), but I'm not sure about bilinear?
But still... I don't get it: How can a potential branch misprediction be compensated by the gains of using a direct _mm_set_ps1?
_________________
--------------------------------------------------------------
Whatever
(L) [2007/10/03] [Phantom] [Arauna BVH source code] Wayback!I see what you mean. I'm not sure if this is desirable (the 5x size thing); in fact, I just got a request from a visual artist that wants to switch to 1024x1024 textures. [SMILEY Smile] But this would definitely speed up light maps, where severe oversampling is very common. I was considering this, but the expected cost kept me away...
_________________
--------------------------------------------------------------
Whatever
(L) [2007/10/04] [Shadow007] [Arauna BVH source code] Wayback!I made some tests on my idea today :
First, in GetColorAtIP, I put all the color information in the "same prims" way (ie RRRR, GGGG, BBBB). This allowed me to "streamline" the ApplyLights function, where only one path is taken (with two variants) instead of two paths.
It also allowed me to check in that function for "same material" instead of same primitive.
(BTW coloruv is now unnneccessary in my version).
I got some (not properly tested) performance increase, and at least better code readability.
Next would be to modify the GetColorAtIP function to a path like the one I suggested above:
(L) [2007/10/04] [Phantom] [Arauna BVH source code] Wayback!Can you post that code?
_________________
--------------------------------------------------------------
Whatever
(L) [2007/10/04] [Shadow007] [Arauna BVH source code] Wayback!will do [SMILEY Smile] (tomorrow)
(L) [2007/10/04] [Phantom] [Arauna BVH source code] Wayback!No rush now that I think of it, since I just ported my entire shading pipeline to use ARGB/32 as input... So your code is probably vastly different. I'll try to make a new source release soon.
_________________
--------------------------------------------------------------
Whatever
(L) [2007/10/04] [Phantom] [Arauna BVH source code] Wayback!Here's a package with the latest stuff for those that like to have it:
[LINK http://www.bik5.com/araunasrc_bvh_oct04_2007.rar]
This is the BVH version (kd version is pretty much deep-frozen now). It's still not feature-complete (wrt kd-version) but at least it supports 'many lights' again. Texturing is now using 32bit textures internally, plus 32bit normal maps (2x16bit vector component). I tried to decouple textures and normal maps (someone at RT07 claimed that it didn't matter) but that put me back 7%, so it's back in now... Performance is slightly above the kd-version. I have to admit I was severely disappointed this afternoon when I found out just how small the gains are...
Things to do: Reflections, refractions and animation (i.e., put back the features of the kd-tree version), second plane for beams visiting the leafs, Shadows idea of doing more in SIMD mode, adaptive beam sizes. After that, plans are vague.
_________________
--------------------------------------------------------------
Whatever
(L) [2007/10/05] [Shadow007] [Arauna BVH source code] Wayback!Glad to have been able to help [SMILEY Smile]
Just a note :
It seems the "rgba" field of the Color class should be renamed "bgra" ... caused me a little problem [SMILEY Smile]
Edit to add some:
I'd be quite happy to get 8% increase each time I have a new idea ! [SMILEY Smile]
Did you add a "mat/mat4" field to the IData class ? I found it convenient.
I also thought to add a Coherency flag for each line of 4 rays, set to 2 if same prims and 1 if same material
By computing it once before/at the beginning of the shading, it would allow to dispense of all those later
r== r+1 && r1==r+2 && r+2== r+3 tests (replacing them with if(ID[r]->coherence))
Finally, could you please send your modified version so that I can stay in sync ?
(L) [2007/10/08] [Phantom] [Arauna BVH source code] Wayback!I'm considering rewriting the shading code in 8.8 fixed point code... I think that would safe quite some conversions. It would limit the 'H' in HDR to 256, but I think that's quite acceptable. If I keep colors in -A-R-G-B format (16 bit per component, of which 8 bit used for the normal 0..255 range), I can do most operations using a single opp where I used to do three.
Anyway...
Another random thought: The BVH traversal thingy is testing tons of rays against primitives. If the number of rays / prim is that high, the cost of supporting multiple primitive types (conditional jump for e.g. spheres, quads) would be amortized over all those rays... Hmm.
_________________
--------------------------------------------------------------
Whatever
(L) [2007/10/09] [Shadow007] [Arauna BVH source code] Wayback!Just a small bug fix : in your "published" version(araunasrc_bvh_oct05-2007), you should switch the GetAmbientRed4 and GetAmbientBlue4 in lines 569 and 571 of core.cpp.
(L) [2007/10/10] [Shadow007] [Arauna BVH source code] Wayback!I implemented my "material coherency" thing and got disapointed by the results : it only adds something at extremely low resolutions ...
It could work better though with highly incoherent scenes whith lots of small distinct prims using the same texture/material : trees, falling triangles ...
It would also be benefitting if using some kind of texture Atlas...
I can send the files if you want [SMILEY Smile]
(L) [2007/10/10] [Phantom] [Arauna BVH source code] Wayback!I found a very severe flaw in Arauna: In the Surface128 constructor that takes a filename as an argument, the line that allocates room for the pixel buffer does not align this buffer to 128bit. I don't know how I ever got away with this, but obviously, this will cause crashes. The correct line is
(L) [2007/10/11] [Phantom] [Arauna BVH source code] Wayback!Thanks. [SMILEY Smile]
_________________
--------------------------------------------------------------
Whatever
(L) [2007/10/11] [lycium] [Arauna BVH source code] Wayback!np :) heard about that one from tbp, as i was using aligned_alloc or something from msvc's proprietary libs!
(L) [2007/10/11] [Shadow007] [Arauna BVH source code] Wayback!After some more thoughts, it seems to me the "material coherence" could also be usefull in some cases I did not identify first : for secondary rays (reflections/refractions). I also did some new measures this morning. It can gain a few percents (1-2) in some cases, and loose a few percent in some others (close to a primitive for example)
So I thought I could send my update [SMILEY Smile]
As I can't acess zshare anymore, here is the modified code :
(L) [2007/10/12] [Phantom] [Arauna BVH source code] Wayback!That's quite possible, indeed.
_________________
--------------------------------------------------------------
Whatever
(L) [2007/10/12] [lycium] [Arauna BVH source code] Wayback!hmm, a non-working program becoming 8-25% faster is a pretty abstract concept for me... priorities?
(L) [2007/10/12] [Phantom] [Arauna BVH source code] Wayback![SMILEY Smile] The program works, but I see what you mean, so let me clarify:
* Integer texture fetching works
* Integer bilinear filtering works
* Integer ambient, diffuse, specular works
* All the corner cases work: Different materials, all combo's of normal mapped, textured, no texture, etc. work
When I wrote that, one thing was missing: Lights that have a brightness beyond 1 (for any component). So, bright lights would be displayed with banding, where the integer range swaps back to 0. That has been solved now with two SSE instructions per four pixels. It's now completely working.
_________________
--------------------------------------------------------------
Whatever
(L) [2007/10/12] [lycium] [Arauna BVH source code] Wayback!what i meant is that the demo crashes on most people's machines, and telling them that it's now running 8-25% faster is a little difficult to imagine. actually i was expecting the program, since it's compiled with intel's compiler, to work on my new quadcore cpu, but it still doesn't :/
to clarify my meaning wrt priorities, the more awesome you make arauna the more unfortunate it becomes that it rarely works on a given pc. it doesn't matter how cool it is, if it doesn't run. etc...
(L) [2007/10/12] [Phantom] [Arauna BVH source code] Wayback!Relax dude. As pointed out, a serious flaw was found only days ago. I will release a new demo soon enough; until that time, you're welcome to compile the open source package yourself. Of course you don't have to agree with my priorities, but neither do I have to agree with yours. [SMILEY Smile]
_________________
--------------------------------------------------------------
Whatever
(L) [2007/10/12] [lycium] [Arauna BVH source code] Wayback!dude, i am relaxed. really i'm not losing any hair/sleep over it, but i just don't see the point of speeding up software that... ahh, i should rather find something positive to say :|
gratz on the texturing/shading speedup, i'm sure it takes you back to the days of using 4.4 lookup tables ;)
(L) [2007/10/18] [Shadow007] [Arauna BVH source code] Wayback!An other error spotted :
in void BVHierarchy::Build( SubPrim** a_Prim, unsigned int a_PCount, float intersectionCost ),
(L) [2007/10/18] [Shadow007] [Arauna BVH source code] Wayback!An other bug still :
back