SIMD operator overloading back

(L) [2007/09/30] [Phantom] [SIMD operator overloading] Wayback!

In Xela's paper I noticed that he had some operator overloads for SIMD instructions. So, I figured this could make my code quite a bit prettier (and shorter), so I 'translated' some parts. To my suprise, this reduced overall performance to about 50% of the original performance. I may have made errors in the process (although the image output was still the same), and perhaps it's relevant to note that I used ICC9.1x, but still. I left all the parentesises, so I expected equal code output, but apparently I got something different... So, a typical dotproduct:
(L) [2007/09/30] [ingenious] [SIMD operator overloading] Wayback!

Huh, too bad... I am using much more than that in my code  [SMILEY Confused]


...and I don't remember any performance gain when I switched to ICC 10...
(L) [2007/09/30] [Phantom] [SIMD operator overloading] Wayback!

So perhaps this is your problem?
_________________
--------------------------------------------------------------

Whatever
(L) [2007/09/30] [Phantom] [SIMD operator overloading] Wayback!

So apparently this problem exists in both ICC9.1 and ICC10.x...

And no, I didn't try to forceinline. It's just one instruction so I expect it to inline!?
_________________
--------------------------------------------------------------

Whatever
(L) [2007/09/30] [Michael77] [SIMD operator overloading] Wayback!

Funny, this is exactly what I am currently fighting with, although I am using msvc8 compiler. Calling an overloaded function gives me about 80% performance drop, although the function is inlined. For example, I have code like this:
(L) [2007/10/02] [tbp] [SIMD operator overloading] Wayback!

A few months ago i've once again tried to make an hybrid SOA wrapper because i thought there was some chance compilers had matured enough and i could really use it, mostly to make shading simpler. So i refactored a complete traversal (+ intersection and basic shading) and benched my progression.


Brief summary: after a long struggle and much tinkering, i got gcc4.3 to produce slightly better code with the wrapper; i couldn't  get around a bunch of issues with icc v9/10's output and the wrapped version was definitely slower . I'll qualify msvc8's case as terminally fubar.


When using such a wrapper you're in fact implicitly expecting and relying on many many optimizations to happen; out of the myriad of equivalent variations around the wrapper theme, you got to find one that reconciliate theory and practice. If it exists. That also means if you have 3 compilers to play with, you'll end up with 3 versions.


From my fuzzy memory things that matters most are: array or distinct members, initialization lists, return value type (is it implicitely going through a conversion-ctor?), passing by values. Some choices are conflicting or possibly make use of that wrapper too cumbersome.


Note 1: silly compilers will reduce your option set, ie msvc
(L) [2007/11/10] [tbp] [SIMD operator overloading] Wayback!

Excerpt from [LINK http://gcc.gnu.org/gcc-4.3/changes.html]

back