SSE Patterns back
(L) [2007/10/06] [Michael77] [SSE Patterns] Wayback!Hi,
I just thought it would be nice to have some fast patterns for instructions not supported by SSE by default like abs/negate/pow and so on. So maybe we could start some sort of repository for these here.
So let´s get started:
absolute value:
(L) [2007/10/06] [lycium] [SSE Patterns] Wayback!excellent idea, and a great start :)
looking forward to seeing other people's bit-tricks!
(L) [2007/10/06] [steph] [SSE Patterns] Wayback!Hi,
Sorry for my poor english (i am french  [SMILEY Wink] ... )
There was a time where i've beginned to make a raytracer fully with SSE2 maths. But now, i haven't time to continue this project.
I send here my work about SSE2 math functions. I hope this can be useful for someone.
(L) [2007/10/06] [Michael77] [SSE Patterns] Wayback![SMILEY Very Happy] Great stuff [SMILEY Smile] I think I need to convert it to intrinsics to fully understand it but it looks really great.
By the way: another crude approximation to exp(x) with -1<x<1:
(L) [2007/10/06] [steph] [SSE Patterns] Wayback!The asm code is from the approximate math library. i 've just changed the calling convention (for compatibility with MSVC compiler vs ICC). Don't turn it to intrinsics, you will lost performance. --> [LINK http://www.intel.com/cd/ids/developer/asmo-na/eng/microprocessors/ia32/pentium4/optimization/19036.htm]
The _mm_rcpnr0_xx (Reciprocal Newton Raphson) is the same as _mm_rcpnr_xx but it handle the 0.0f case.
The same thing for _mm_rsqrtnr0_xx vs _mm_rsqrtnr_xx
@+
back