16 times equal back

(L) [2006/09/02] [Phantom] [16 times equal] Wayback!

Quick question: What is the fastest way to determine if a 4x4 packet returned 16 times the same primitive? It's a continuous array of 16 unsigned ints. Can this be optimized using SIMD or should I use a simple integer loop?
_________________
--------------------------------------------------------------

Whatever
(L) [2006/09/02] [Phantom] [16 times equal] Wayback!

Like this?
(L) [2006/09/02] [Phantom] [16 times equal] Wayback!

I just realized I'm making things too difficult. It's for shading, and obviously I want to process it using SIMD. However there's no absolute need to do all 16 rays simultaneously; 4 rays at a time is fine. Then the test becomes quite a bit simpler.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/09/02] [funky] [16 times equal] Wayback!

The worst case ist the same ... 15 comparisons, this happens when all primitives are equal.

The best case is also the same ... 1 comparison, but then you habe to check after every comparison (branch misprediction).

My thoughts where ... compare only 0-1, 2-3, ..., 14-15 ... then check. The likelihood that this test fails should be very low.

When this test answers true, you have to check 0-2, 4-6, 8-10, 12-14. And then as discribed above 0-4, 8-12.
(L) [2006/09/02] [Lynx] [16 times equal] Wayback!

Hm...let's see, if you only want to know if all are the same, there must be some smart XOR logic...

a^a == 0

If you XOR them pairwise, and OR them all together and compare against 0, you already know if they're pairwise identical...but no idea if that actually pays off...just a weird idea.
(L) [2006/09/02] [tbp] [16 times equal] Wayback!

You can't chain xor to acheive that. Even if "xor a,b -> c" and "xor a,a -> 0", "xor a,0 -> a". EDIT: i'm a bozo.

A comparison is a substraction.


I doubt your id can be stored within 8 bits, but that doesn't really matter.

Say reg0 is { a,b,c,d }, broadcast any component to get reg1 { a,a,a,a }, compare reg0 & reg1 to get a mask, rince & repeat.

That's rewriting your test into
(L) [2006/09/02] [Phantom] [16 times equal] Wayback!

Thanks. [SMILEY Smile] I knew you would come up with a nifty SIMD approach.
_________________
--------------------------------------------------------------

Whatever
(L) [2006/09/02] [tbp] [16 times equal] Wayback!

You forgot to post a billing address for that consultation, you damn SIMD baiting <censored>.
_________________
May you live in interesting times.

[LINK https://gna.org/projects/radius/ radius] | [LINK http://ompf.org/ ompf] | [LINK http://ompf.org/wiki/ WompfKi]
(L) [2006/09/02] [Lynx] [16 times equal] Wayback!

tbp, I think you misunderstood what i mean, i never meant to chain the XORs.

The result of the XOR is only zero when the pair is equal, so different comparisons ORed is only zero when all pairs match, so you would have again

the 15 comparisons as Phatnom posted, or the 16 as you have:
(L) [2006/09/02] [tbp] [16 times equal] Wayback!

Yes, that would work, but you've initially talked about pairwise xor.


Anyway, the xor way takes: 1 broadcast, 4 xor, 4 or, 1 comp vs { 0,0,0,0 }, 1 top bit extraction. That's 1 more op [SMILEY Wink]
_________________
May you live in interesting times.

[LINK https://gna.org/projects/radius/ radius] | [LINK http://ompf.org/ ompf] | [LINK http://ompf.org/wiki/ WompfKi]
(L) [2006/09/02] [Lynx] [16 times equal] Wayback!

That's why i originally said "you already know if they're pairwise identical", not "they're all identical" [SMILEY Smile]

Anyway, i loose [SMILEY Very Happy]
(L) [2006/09/02] [tbp] [16 times equal] Wayback!

You only lose if the metric is a simplistic 'how many ops does it take'.

In a given context maybe a pairwise check might make more sense, like:

back