Sven Woop thesis triangles back

Board: Home Board index Raytracing General Development

(L) [2012/04/22] [ost by Vilem Otte] [Sven Woop thesis triangles] Wayback!

Hi,

could someone who already read and implemented triangles in this thesis just quickly walk through my code - [LINK http://www.pasteall.org/31120/cpp] - and point me to right direction? [SMILEY :roll:] I'm running out of ideas what am I actually doing wrong (and the result is wrong indeed). Thank you.

EDIT: So far it seems that my code is good, but somehow I think that I can't pass triangles in whatever order I like (are there some rules for this?)? - No. They can be in any order.

EDIT2: Could this also be affected by imprecision of using SSE2? - No. Transformation phase is OK. So I have bug in collision testing.
(L) [2012/04/22] [ost by svenwoop] [Sven Woop thesis triangles] Wayback!

Just looked over your code, and it looks all right.

Some things I would verify is:
1) Is your matrix inversion function working? Multiply the original matrix with the inverted to see of you get the identity.
2) Verify that m.m1 is giving you the first row of the inverted matrix.
3) Sounds like you are debugging with multiple triangles. Best put in a ray and a single triangle of which you already know the intersection result.

What imprecision are you exactly talking about?
(L) [2012/04/22] [ost by Vilem Otte] [Sven Woop thesis triangles] Wayback!

Wow, actually I didn't expect that the author of thesis will reply to me. Thank you.

So, you're actually right, my code works (at least in debug mode - I run it through gdb). I've also discovered that when I turn "release" mode and build the exe file, my matrix inversion code is somehow actually not working (which seems strange to me) - so I have to find out what are compiler optimizations exactly doing wrong (note that my whole matrix library is written in SSE2 intrinsics).

EDIT: Okay, I don't know actually why (as I don't work with Visual Studio so often (but debugging in it is way more comfortable than in GDB), and I'll see when I'll compile this with GCC, whether the issue is presented there). But somehow setting floatin-point math to Fast instead of Precise screws my matrix inversion code (whole code in intrinsics!!!) - the only thing I fear is, that this will hurt performance a lot (although as I've said, in the end I'll compile it with GCC - so I'll see whether this problem is still present in GCC or not).

EDIT2: GCC doesn't have any problem like this (even with mfpmath=fast flag). So it seems to be only MSVC situation - maybe I'll try also Clang whether there are issues or not (it's getting better and more popular these days). [SMILEY :)]. Anyway - thank you very much for help - you made a good work on that ray-tri test (works really fast!), and btw. welcome to the forums (I see you joined just yesterday).
(L) [2012/04/22] [ost by Geri] [Sven Woop thesis triangles] Wayback!

-compilers like to fuck the object-oriented code when you try to create something speed-critic. you should printf your variables from line to line. also, if it works if you dump all of them, just start to remove the printfs from step by step ^^
-maybe as your hints, the compiler makes SSE code automaticly from your vector4-s, and fucks them somehow?
-also, inlude your standard c/c++ matrix implementation instead of your sse code
-dont forget to add volatile and alignation to your inline assembly blocks/variables.

also, try this too:

if(v >= 0.0f && u + v <= 1.0f)
to
if((v >= 0.0f) && ((u + v) <= 1.0f))

if(u >= 0.0f && u <= 1.0f)
to
if((u >= 0.0f) && (u <= 1.0f))
(L) [2012/04/23] [ost by Vilem Otte] [Sven Woop thesis triangles] Wayback!

My code for matrix inversion looks like this:
Code: [LINK # Select all]        friend inline mat4 inverse(const mat4& m)
        {
            __m128 f1 = _mm_sub_ps(_mm_mul_ps(_mm_shuffle_ps(m.m3, m.m2, 0xAA),                                    
                                              _mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0xFF), _mm_shuffle_ps(m.m4, m.m3, 0xFF), 0x80)),                        
                                   _mm_mul_ps(_mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0xAA), _mm_shuffle_ps(m.m4, m.m3, 0xAA), 0x80),                                            
                                              _mm_shuffle_ps(m.m3, m.m2, 0xFF)));            
            
            __m128 f2 = _mm_sub_ps(_mm_mul_ps(_mm_shuffle_ps(m.m3, m.m2, 0x55),                                    
                                              _mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0xFF), _mm_shuffle_ps(m.m4, m.m3, 0xFF), 0x80)),                        
                                   _mm_mul_ps(_mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0x55), _mm_shuffle_ps(m.m4, m.m3, 0x55), 0x80),                                            
                                              _mm_shuffle_ps(m.m3, m.m2, 0xFF)));            
            
            __m128 f3 = _mm_sub_ps(_mm_mul_ps(_mm_shuffle_ps(m.m3, m.m2, 0x55),                                    
                                              _mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0xAA), _mm_shuffle_ps(m.m4, m.m3, 0xAA), 0x80)),                        
                                   _mm_mul_ps(_mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0x55), _mm_shuffle_ps(m.m4, m.m3, 0x55), 0x80),                                    
                                              _mm_shuffle_ps(m.m3, m.m2, 0xAA)));            
            
            __m128 f4 = _mm_sub_ps(_mm_mul_ps(_mm_shuffle_ps(m.m3, m.m2, 0x00),                            
                                              _mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0xFF), _mm_shuffle_ps(m.m4, m.m3, 0xFF), 0x80)),                
                                   _mm_mul_ps(_mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0x00), _mm_shuffle_ps(m.m4, m.m3, 0x00), 0x80),            
                                              _mm_shuffle_ps(m.m3, m.m2, 0xFF)));            
            
            __m128 f5 = _mm_sub_ps(_mm_mul_ps(_mm_shuffle_ps(m.m3, m.m2, 0x00),        
                                              _mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0xAA), _mm_shuffle_ps(m.m4, m.m3, 0xAA), 0x80)),                    
                                   _mm_mul_ps(_mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0x00), _mm_shuffle_ps(m.m4, m.m3, 0x00), 0x80),        
                                              _mm_shuffle_ps(m.m3, m.m2, 0xAA)));            
            
            __m128 f6 = _mm_sub_ps(_mm_mul_ps(_mm_shuffle_ps(m.m3, m.m2, 0x00),        
                                              _mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0x55), _mm_shuffle_ps(m.m4, m.m3, 0x55), 0x80)),                
                                   _mm_mul_ps(_mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0x00), _mm_shuffle_ps(m.m4, m.m3, 0x00), 0x80),    
                                              _mm_shuffle_ps(m.m3, m.m2, 0x55)));
            __m128 v1 = _mm_shuffle_ps(_mm_shuffle_ps(m.m2, m.m1, 0x00), _mm_shuffle_ps(m.m2, m.m1, 0x00), 0xA8);            
            __m128 v2 = _mm_shuffle_ps(_mm_shuffle_ps(m.m2, m.m1, 0x55), _mm_shuffle_ps(m.m2, m.m1, 0x55), 0xA8);            
            __m128 v3 = _mm_shuffle_ps(_mm_shuffle_ps(m.m2, m.m1, 0xAA), _mm_shuffle_ps(m.m2, m.m1, 0xAA), 0xA8);            
            __m128 v4 = _mm_shuffle_ps(_mm_shuffle_ps(m.m2, m.m1, 0xFF), _mm_shuffle_ps(m.m2, m.m1, 0xFF), 0xA8);            
            __m128 s1 = _mm_set_ps(-0.0f,  0.0f, -0.0f,  0.0f);            
            __m128 s2 = _mm_set_ps( 0.0f, -0.0f,  0.0f, -0.0f);    
            __m128 i1 = _mm_xor_ps(s1, _mm_add_ps(_mm_sub_ps(_mm_mul_ps(v2, f1),                    
                                                             _mm_mul_ps(v3, f2)),                            
                                                  _mm_mul_ps(v4, f3)));
            __m128 i2 = _mm_xor_ps(s2, _mm_add_ps(_mm_sub_ps(_mm_mul_ps(v1, f1),        
                                                             _mm_mul_ps(v3, f4)),                                            
                                                  _mm_mul_ps(v4, f5)));            
            __m128 i3 = _mm_xor_ps(s1, _mm_add_ps(_mm_sub_ps(_mm_mul_ps(v1, f2),                    
                                                             _mm_mul_ps(v2, f4)),                                
                                                  _mm_mul_ps(v4, f6)));            
            __m128 i4 = _mm_xor_ps(s2, _mm_add_ps(_mm_sub_ps(_mm_mul_ps(v1, f3),                
                                                             _mm_mul_ps(v2, f5)),                        
                                                  _mm_mul_ps(v3, f6)));
            __m128 d = _mm_mul_ps(m.m1, _mm_movelh_ps(_mm_unpacklo_ps(i1, i2), _mm_unpacklo_ps(i3, i4)));            
            d = _mm_add_ps(d, _mm_shuffle_ps(d, d, 0x4E));    
            d = _mm_add_ps(d, _mm_shuffle_ps(d, d, 0x11));    
            d = _mm_div_ps(_mm_set1_ps(1.0f), d);    
            return mat4(float4(_mm_mul_ps(i1, d)),    
                        float4(_mm_mul_ps(i2, d)),                
                        float4(_mm_mul_ps(i3, d)),                
                        float4(_mm_mul_ps(i4, d)));
        }
And VS actually drops out some instructions when using Fast floating-point math. [SMILEY :shock:] Strange thing is, that GCC in MinGW doesn't do this, so...
(L) [2012/04/23] [ost by Geri] [Sven Woop thesis triangles] Wayback!

,,drops out some instructions''
[LINK http://msdn.microsoft.com/en-us/library/12a04hfd%28v=vs.80%29.aspx]

back