General Optimizing? back
(L) [2006/08/21] [Michael77] [General Optimizing?] Wayback!Hi,
I wonder how you perform optimizations on the code level. I just installed VTune and while it seems very helpfull (in fact, it helped me to find some serious bottlenecks), it is also a bit overwhelming and I am looking for a good way to start tweaking the code. It seems like the most serious problem in my code seems to be the branch prediction, which fails in around 70%.
How do you start improving these parts? I have found few helpfull articles on this topic, maybe someone could give me a hint (maybe a good VTune introduction with some example how the numbers shown help me improve my code).
Michael
(L) [2006/08/22] [olliej] [General Optimizing?] Wayback!The first thing i would do is ensure that the algorithms themselves are reasonably efficient -- no amount of code optimisation will help you if you have a O(N^3) algorithm when an  O(N log N) one exists.  Assuming you've done that it's time to start reading up on how branch prediction works -- a large portion of which is based on whether a jump is forwards or backwards... but i can't remember which (yes i suck [SMILEY Wink])
(L) [2006/08/22] [Michael77] [General Optimizing?] Wayback!I think, my basic algorithms work quite well (doing the bunny benchmark with 1.7Million Rays/Second and buildtime of ~200ms with BIH and SAH approximation on a PentiumM 1.7. Even complex scenes ( interior of a Golf GTI with 3 Million triangles) can trace about 300.000 completely random Rays/s without SSE). Since getting from a simple BIH to a SAH approximating SAH got me close to nothing although I get a much better tree (less number of nodes, larger empty space cutoff near the top, lower average and maximum depth). So there is not too much left for me on the algorithmic part.
(L) [2006/08/22] [Phantom] [General Optimizing?] Wayback!200ms for a BIH/SAH combo? And the SAH didn't help? How did you implement it?
_________________
--------------------------------------------------------------
Whatever
(L) [2006/08/22] [Michael77] [General Optimizing?] Wayback!It did improve my results on complex scenes by about 20%, but not on bunny (or let´s say, the improvement was below 5%). It is just a simple approximation, there is quite some room for improvement. Basicly, I am just doing the first pass like described in the adaptive error bound heuristics paper. I test 24 possible split candidates ( 8 on each axis, all done in one scan pass using SSE - I have no idea, whether it is already thrashing my registers or not, but it works) and whether a clipnode would produce lower costs or not. Then I simply choose the one with the lowest cost. Additionally, when there are 8 or less triangles left, I switch to checking the real splitpositions (that is the center of each triangle becomes a split candidate). I haven´t had time yet to implement the rest of the paper, but I will try it soon since I think it will help quite a bit, especially on the upper levels where 8 split candidates is clearly not enough. However, I am still trying to figure out, how to choose the correct range for a refinement.
So there must be some part that is really slowing me down. Also, I still think it is strange that the 1.7GHz Pentium performs better than a 3.2GHz Xeon, so something seems to really smash the pipeline.
(L) [2006/08/22] [Phantom] [General Optimizing?] Wayback!You are probably already aware of this, but the paper does not suggest to pick the best candidate from eight (or 24, considering 3 axii); it suggests to reconstruct the cost function based on eight samples, and to pick the minimum of that cost function. Even if you refine using 8 extra samples, you still don't take the minimum of these 8 positions, but again the minimum of the (now improved) approximated cost function.
_________________
--------------------------------------------------------------
Whatever
(L) [2006/08/22] [Michael77] [General Optimizing?] Wayback!@Phantom: Yes, I am aware of that. I just hadn´t the time to actually implement that (and am still not shure on how to approximate the function either - got to read a littlebit about this first). I also wonder wheter a linear approximation is sufficient or a spline approximation is more likely to produce the results we want.
@tbp : Thanks. I guess, I need to improve my assembly skills then.
back