Path tracing benchmark

(L) [2017/01/25] [ost by mpeterson] [Path tracing benchmark] Wayback!

It took some time after we build up our first coherent traversal kernel for avx512 [LINK http://ompf2.com/viewtopic.php?f=3&t=2103]
to get a competitive incoherent kernel ready for prime time. Here it is ! For the benchmark we used a full pipeline available easily on any
architecture:

1) camera ray generation, traversal, intersection, shading
2) for any hit: 64 secondary diffuse rays, traversal, intersection and shading

Architectures:

1) Nvidia 1080GTX kernels are based on: "Understanding the Efficiency of Ray Traversal on GPUs"
2) Embree 2.13 avx512 kernels on intel phi 7250 1.4ghz, 68 cores:

Kernels:
- avx512knl::BVH8Triangle4Intersector16HybridMoellerNoFilter for camera rays and
- avx512knl::BVH8Triangle4Intersector1Moeller for secondary diffuse rays

Bvh-Compiler Settings:
- avx512knl::BVH8BuilderFastSpatialSAH

3) Our new avx512 kernels for coherent and incoherent ray transport on intel phi 7250 1.4ghz, 68 cores

[IMG #1 Image]

Our new kernels clearly outperform any other implementation on all important platforms currently used for path tracing.
Looking at the knl cpu there is another advantage: We can directly connect to high performance networks like infiniband
to scale extremely good compared to gpus in a cluster-like configuration.

At the end we also implemented/migrated our fast bvh compilers [LINK http://rapt.technology/data/pssbvh.pdf] to avx512.
The compiler timings for all scenes above are:

Fairy: 13.6ms
Runghold: 105.1ms
San Miguel: 359.3ms
Sponza: 7.1ms

The embree compilers in any configuration are far behind these timings on our knl test-system. Therefore we decided not to
publish any numbers here. The sbvh compiler used in "Understanding the Efficiency of Ray Traversal on GPUs" is far away from being
optimized at all.

mp

[IMG #1]:Not scraped: https://web.archive.org/web/20210616151507im_/https://picload.org/image/roaacgac/incoherentresults.png

(L) [2017/01/26] [ost by manysmallcores] [Path tracing benchmark] Wayback!

Cool stuff!

Couple of remarks/questions:
- What about open-sourcing the benchmark? Otherwise nobody else can reproduce these numbers.
- Why not using Optix Prime as it should provide additional optimizations, right?
- Shouldn't a Titan X be quite a bit faster than a 1080GTX?
- Why are you switching from packets to single rays in the Embree case? I guess your benchmark uses large streams of rays so why don't you stick to the hybrid interface or even use Embree's ray stream interface? The current kernel selection seems strange.
- How many rays per stream (per HW thread) do you use? Are you using any kind of stream compaction?
- Can you run the benchmark with more complex scenes (fairy, sponza are just toy models). Maybe something with 30-100M primitives should be more representative.
- From my experience the build times of a spatial split builder largely depends on how often you do spatial splits (vs. object splits), in particular deep deep down in the tree. Do you use similar heuristics than the other implementations? Can you compare the quality of the BVHs (both for NV and Embree), otherwise build times are very hard to compare?
- I actually have access to a 7250 phi machine and I've quickly tested Embree's bvh build performance (buildbench) for SanMiguel. Without spatial splits I get half of your measured time, and with spatial splits a bit less than 2x. Does not seem way of...
- What about the build performance for SanMiguel with Optix Prime?
- What is the warp utilization for the 1080GTX during the benchmark?

Thanks.

Path tracing benchmark back

Board: Home Board index Raytracing General Development