(L) [2006/09/09] [Phantom] [lowest value in packet] Wayback!Question for an SIMD guru:
I have a packet with four values. How do I determine the *index* of the lowest value in the most efficient manner? Right now I have an ugly loop:
(L) [2006/09/09] [madmethods] [lowest value in packet] Wayback!Hmm...how about this:
Do a horizontal min (I think Carsten posted one), get the min into all 4 elements of an __m128, do a compare for equal, a movemask, and then a bit scan forward to give you the index of the first element that tested equal to the min.
Not claiming anything regarding speed; it's just what popped into my head.  Does have the advantage of having no branches at all.  Probably has a lot of cross-CPU variance (varying latency/throughput for things like shuffle, movemask, bsf).
-G
(L) [2006/09/09] [tbp] [lowest value in packet] Wayback!A "write down vector, load & compare scalars" sequence could also be made branchless, and with store-load forwarding wouldn't be that bad even in tight scheduling conditions. Plus it doesn't need many registers.
_________________
May you live in interesting times.
[LINK https://gna.org/projects/radius/ radius] | [LINK http://ompf.org/ ompf] | [LINK http://ompf.org/wiki/ WompfKi]