lowest value in packet back

(L) [2006/09/09] [Phantom] [lowest value in packet] Wayback!

Question for an SIMD guru:


I have a packet with four values. How do I determine the *index* of the lowest value in the most efficient manner? Right now I have an ugly loop:
(L) [2006/09/09] [Lotuspec] [lowest value in packet] Wayback!

Removed
(L) [2006/09/09] [madmethods] [lowest value in packet] Wayback!

Hmm...how about this:


Do a horizontal min (I think Carsten posted one), get the min into all 4 elements of an __m128, do a compare for equal, a movemask, and then a bit scan forward to give you the index of the first element that tested equal to the min.


Not claiming anything regarding speed; it's just what popped into my head.  Does have the advantage of having no branches at all.  Probably has a lot of cross-CPU variance (varying latency/throughput for things like shuffle, movemask, bsf).


-G
(L) [2006/09/09] [tbp] [lowest value in packet] Wayback!

A "write down vector, load & compare scalars" sequence could also be made branchless, and with store-load forwarding wouldn't be that bad even in tight scheduling conditions. Plus it doesn't need many registers.
_________________
May you live in interesting times.

[LINK https://gna.org/projects/radius/ radius] | [LINK http://ompf.org/ ompf] | [LINK http://ompf.org/wiki/ WompfKi]
(L) [2006/09/09] [tbp] [lowest value in packet] Wayback!

To prove the point, here's what i get from gcc in 32bit mode. Branchless & without nasty dependency chain.
(L) [2006/09/10] [Phantom] [lowest value in packet] Wayback!

Tbp: Is that just doing
(L) [2006/09/10] [tbp] [lowest value in packet] Wayback!

Nope. That code is more like

back