SSE 4-Ray vs. Sphere? back

Board: Board index Raytracing General Development My First...

(L) [2013/08/08] [tby ironwallaby] [SSE 4-Ray vs. Sphere?] Wayback!

I'm finally getting with the program and working on learning SSE, but I'm having trouble trying to build a 4 ray vs. 1 sphere intersection function. While it's trivial in single-ray land, when working with SIMD instructions I'm getting pretty confused about how one goes about handling branching.
Does anybody have any pointers to where I can read more about the topic, or (better yet) have any example code that I could dig into?
Thanks!
(L) [2013/08/08] [tby graphicsMan] [SSE 4-Ray vs. Sphere?] Wayback!

This appears to show how to do this:
[LINK http://felix.abecassis.me/2012/08/sse-vectorizing-conditional-code/]
(L) [2013/08/09] [tby ironwallaby] [SSE 4-Ray vs. Sphere?] Wayback!

Thanks a ton, that was exactly the nudge I needed.
For example, here's a 4-ray vs. sphere:
Code: [LINK # Select all]static v4sf intersect(const ray4_t *ray) {
  /* Compute discriminant. */
  const v4sf b = -dot(&(ray->origin), &(ray->direction)),
             d = b * b - sq(&(ray->origin)) + V4SF_ONE,
             t = b - __builtin_ia32_sqrtps(d);
  /* Which rays hit the sphere? (Note that this works since any rays that
   * didn't hit the sphere will be NaN, because we had to sqrt a negative
   * number. Comparing that to zero will be false as a result.) */
  const v4si mask = (v4si) __builtin_ia32_cmpgeps(t, V4SF_ZERO);
  /* Return the rays that hit the sphere, or infinity otherwise. */
  return __builtin_ia32_orps(
    __builtin_ia32_andps((v4sf) mask, t),
    __builtin_ia32_andnps((v4sf) mask, V4SF_INFINITY)
  );
}
(Yeah, yeah, the sphere is always a unit sphere centered on the origin and we only check against the outside, but one thing at a time. [SMILEY :)] I also don't know if it's faster to bail early if all of the discriminants are negative or not. I'll have to test.)

back