Is it still worth it to do CPU SIMD ray tracing? back

Board: Home Board index Raytracing General Development

(L) [2013/06/21] [ost by shiqiu1105] [Is it still worth it to do CPU SIMD ray tracing?] Wayback!

Recently I have been reading about doing ray tracing with SSE intrinsics.

And I felt that it's really troublesome to pack data into ray packets and use SSE.
Now that we have GPU, is it still worth doing it??

Timo Alia has their fastest ray tracer written in CUDA, right?

Do they also pack their ray tracer in Structure of Array manner??
(L) [2013/06/21] [ost by dbz] [Is it still worth it to do CPU SIMD ray tracing?] Wayback!

Yes, it still makes sense. Have a look at Intel Embree for example. It uses SSE to speed up bvh traversal although it does not do packet tracing. I don't have recent cpu hardware to test on, but I suspect it could outperform a single gpu for medium to complex scenes on a Ivy Bridge or Haswell cpu. Besides speed, there are many things that just can't be done well on the gpu, like bidirectional path tracing.

Imho Embree is one of the most overlooked pieces of software in the rendering field. Intel marketed it very poorly, they only provided some very simple scenes like the Cornell Box on which Embree can't beat the gpu because scenes like that are put into constant memory/cache memory by the gpu driver. And Embree has a proprietary scene format, so other scenes can't be tested easily. Embree also comes with a very fast bvh builder, btw.
(L) [2013/06/21] [ost by shiqiu1105] [Is it still worth it to do CPU SIMD ray tracing?] Wayback!

>> dbz wrote:Yes, it still makes sense. Have a look at Intel Embree for example. It uses SSE to speed up bvh traversal although it does not do packet tracing. I don't have recent cpu hardware to test on, but I suspect it could outperform a single gpu for medium to complex scenes on a Ivy Bridge or Haswell cpu. Besides speed, there are many things that just can't be done well on the gpu, like bidirectional path tracing.

Imho Embree is one of the most overlooked pieces of software in the rendering field. Intel marketed it very poorly, they only provided some very simple scenes like the Cornell Box on which Embree can't beat the gpu because scenes like that are put into constant memory/cache memory by the gpu driver. And Embree has a proprietary scene format, so other scenes can't be tested easily. Embree also comes with a very fast bvh builder, btw.
But I have also been wondering isn't performance considerably hurt when branches are hit by SIMD??
(L) [2013/06/22] [ost by dbz] [Is it still worth it to do CPU SIMD ray tracing?] Wayback!

>> shiqiu1105 wrote:But I have also been wondering isn't performance considerably hurt when branches are hit by SIMD??
Actually, its gpu hardware that is poorly suited for global illumination because of varying path length, divergence inside a warp or wavefront and incoherent memory access. That's why people came up with hacks like 'persistent threads' to mask the inefficiency of gpu hardware. Even worse, those hacks are usually generation dependent. For example, 'persistent threads' actually slowed things down on more recent hardware like Fermi. It's just that the gpu makes up for its inefficiencies by its massive parallelism, enormous power consumption compared to the cpu and the ability to put up to 8 gpu's on one mainboard.
(L) [2013/06/22] [ost by mpeterson] [Is it still worth it to do CPU SIMD ray tracing?] Wayback!

as we will have 16wide vector units today/soon on the cpu, you have to ! without you
throw away too much compute power. since the gpu-hype, people totally lost track on the latest development
in in-coherent ray transport on wide vetcor units. embree isnt the best example here.

mp
(L) [2013/06/22] [ost by dbz] [Is it still worth it to do CPU SIMD ray tracing?] Wayback!

Could you elaborate on that? Afaik the 8/16- whatever-wide simd units on the cpu actually suffer from the same problems as the gpu. Intel specifically mentioned that too in one of their tech reports/papers on incoherent ray tracing on Xeon Phi and they called incoherent ray tracing on wide vector units 'challenging'. A version of Embree (2.0) supporting Xeon Phi was supposed to be released already, the reason it didn't is probably the above mentioned reason.

Embree actually does pretty well on tracing incoherent rays using standard SSE. In fact it is the fastest cpu path tracer I have ever seen. And unlike the gpu, it can easily be extended to do bi-directional path tracing or photon mapping. I don't see why it should not be a good example. I haven't seen anybody do better yet.
(L) [2013/06/22] [ost by Serendipity] [Is it still worth it to do CPU SIMD ray tracing?] Wayback!

Yes, CPU SIMD raytracing (and efficient multithreading and clustering) is still worth it, especially in a professional environment. Now I am probably quite a bit biased about this but there are a couple of reasons why CPU raytracing will continue to live on:

#1: No driver issues or dependencies. If it works on your computer, it will work on another computer as well. If it doesn´t, you are responsible to fix the bug and don´t have to wait for AMD/Nvidia to fix their drivers (and hope they don´t break anything else in the process). And you can be sure that it will just be a bit faster on the next generation CPU without any code changes.

#2: Memory: at the moment, 6GB is the maximum you can buy for a GPU. On the other hand, you won´t find any decent computer with less than 16GB today, Workstations are usually equipped with 24GB or more, and if you need it you can even find workstations with 256GB in it, so you can just trace large scenes without problems.

#3: Speed: Despite the terrific marketing in terms of GPU computing the performance advantage of GPU raytracing over optimized CPU raytracing is just not that huge once scenes get more complex. Sure, baby scenes are fast, but once you throw scenes with 10-30million triangles or more onto a GPU the performance suffers. Just an example: Some time ago I compared a Dual Nehalem CPU system (2 CPU, 3.2GHz, 8Cores total/16HT) running a CPU optimized full featured CPU based raytracer to iRay running on the same machine with a single Quadro 6000 (so the best you could buy at that time if you need memory). The scene was not really complex, about 8 million triangles, and the scene was setup to give roughly the same results. The CPU based system was about 10-20% faster in that specific scenario. Of course, this is still a pear to apple comparison, since these were two different raytracers, so in order to do a really valid comparison you would have to implement optimized versions of a single raytracer for both architectures, but sill it somewhat proved that you can have a fast CPU raytracer. So yes, a GPU raytracer can be faster when comparing a single GPU to a single CPU, but it is probably about a factor of 2-3 in real scenes and that´s it which means as soon as you move to a DualCPU system the advantage is gone. The issue here is that most CPU based raytracers are just not optimized for modern CPUs so it is pretty easy to write a raytracer that is a 100times faster than those.


For me #1 is the real deal breaker for GPU raytracing. It is no problem if you are doing your personal project or just doing some scientific stuff. But if you are planing on selling your raytracer on a professional level and are required to do enterprise support for it, you will soon notice that it is a nightmare because your customer will install a driver that won´t work, just because maybe another application requires that driver. So in my opinion GPU computing is broken by concept as long as the driver has any influence on how your program operates. On the CPU you just don´t have this issue. If the compiler has a bug, you as a developer need to either work around that bug or change the compiler. But the customer will just get something that you know will work and if it doesn´t it is probably your own fault.
(L) [2013/06/24] [ost by Dade] [Is it still worth it to do CPU SIMD ray tracing?] Wayback!

>> dbz wrote:Actually, its gpu hardware that is poorly suited for global illumination because of varying path length
This is happens only if you write a monolithic kernel and it something you should not do. Not only because of varying path length problem but also because you will be probably unable to compile the kernel on many OpenCL drivers. I have seen the AMD drivers to eat 24GB of CPU ram to compile some (not so complex) kernel. NVIDIA drivers are not that  much better too (with the addition of missing OpenCL 1.2 support and the lack of performance).

P.S. my personal experience confirms what Serendipity has written however I'm bit more favourable to GPUs because you can buy something a LOT faster than a single Quadro 6000 with the price of Dual Nehalem CPU system (i.e. with GPU rendering, you have to buy as many GPUs as you can and the cheapest CPU you can find, it is useless anyway).
(L) [2013/07/28] [ost by tstanev] [Is it still worth it to do CPU SIMD ray tracing?] Wayback!

>> shiqiu1105 wrote:Recently I have been reading about doing ray tracing with SSE intrinsics.

And I felt that it's really troublesome to pack data into ray packets and use SSE.
Now that we have GPU, is it still worth doing it??

Timo Alia has their fastest ray tracer written in CUDA, right?

Do they also pack their ray tracer in Structure of Array manner??
It is still worth using SSE if the alternative is tracing on the CPU without SSE.

Even if it's just tracing single incoherent rays, an SSE-based 4-BVH traversal will outperform binary BVHs and kD-trees for most scenes. And SSE is less troublesome when used in this way, compared to packets, which are more painful to implement.

Also, SSE comes in handy for all kinds of vector and color operations in the shaders as well, if you are using three component colors.

If you cost it out per render hour, then a 50% improvement in speed (possible with SSE) will cut your costs in half. So in the end it depends on how much time you spend rendering -- for production renderers that have to do many frames of video on commodity hardware, you can imagine it makes a difference that over time amortizes the cost of implementing the extra code path.
(L) [2013/07/28] [ost by Geri] [Is it still worth it to do CPU SIMD ray tracing?] Wayback!

i dont think its worth doing by-hand assembly optimisations for special instructions. for me, the formula is more simple: what the compiler compiles on maximal optimisation on the system, is the maximum speed what the architecture can reach. a general purpose cpu should be able to execute the general purpose algorythms at the most possible speed, no programmers optimize for instruction sets by hand any more, since its against the modern programming conventions to do so.

for me, the simd optimisations caused 5% total performance up, and they was bugged different ways with every version of compiler. its waste of time.

back