(L) [2007/09/19] [ingenious] [SSE: __m128i <-> __m128 conversion penalty?] Wayback!I believe I read somewhere that converting __m128i to/from __m128 comes with a little penalty. I am using such conversions quite often in my code and, for example, getting the integer mask of a __m128 comparison results in:
1) Comparison of the two __m128 values with an intrinsic
2) Converting the result to __m128i
3) Converting back to __m128
4) Extracting the integer mask
At first look this may seem stupid, but it results from the design of classes. So is there really a run-time penalty for this conversion and what is the best way for conversion or simulating the _mm_castps_si128 and _mm_castsi128_ps intrinsics on the MSCV8 compiler?
(L) [2007/09/20] [madmethods] [SSE: __m128i <-> __m128 conversion penalty?] Wayback!Those intrinsics are effectively NOPs.  They exist to change the type of a 128-bit value in the eyes of the compiler without changing any of the bits.  You want to have different 128-bit types to allow the compiler to do type checking for you, but then you need these casts for when you occasionally just want to switch a value from one type to another.  In the latest compilers, these intrinsics are built in and I'm pretty sure that the compiler just drops them on the floor (no performance cost).  If you're using a compiler where they're not built in, you would accomplish the same thing by aliasing through memory (store and load, or different pointer types to the same memory, or a union, or whatever).  That's what's shown in the code snippet above, but it can cause performance problems (messing up compiler optimization).
-G