SSE Patterns back

(L) [2007/10/06] [Michael77] [SSE Patterns] Wayback!

Hi,


I just thought it would be nice to have some fast patterns for instructions not supported by SSE by default like abs/negate/pow and so on. So maybe we could start some sort of repository for these here.


So let´s get started:


absolute value:
(L) [2007/10/06] [lycium] [SSE Patterns] Wayback!

excellent idea, and a great start :)


looking forward to seeing other people's bit-tricks!
(L) [2007/10/06] [steph] [SSE Patterns] Wayback!

Hi,


Sorry for my poor english (i am french  [SMILEY Wink] ... )


There was a time where i've beginned to make a raytracer fully with SSE2 maths. But now, i haven't time to continue this project.

I send here my work about SSE2 math functions. I hope this can be useful for someone.
(L) [2007/10/06] [Michael77] [SSE Patterns] Wayback!

[SMILEY Very Happy] Great stuff [SMILEY Smile] I think I need to convert it to intrinsics to fully understand it but it looks really great.



By the way: another crude approximation to exp(x) with -1<x<1:
(L) [2007/10/06] [steph] [SSE Patterns] Wayback!

The asm code is from the approximate math library. i 've just changed the calling convention (for compatibility with MSVC compiler vs ICC). Don't turn it to intrinsics, you will lost performance. --> [LINK http://www.intel.com/cd/ids/developer/asmo-na/eng/microprocessors/ia32/pentium4/optimization/19036.htm]


The _mm_rcpnr0_xx (Reciprocal Newton Raphson) is the same as _mm_rcpnr_xx but it handle the 0.0f case.

The same thing for _mm_rsqrtnr0_xx vs _mm_rsqrtnr_xx


@+

back