Re: giga rays on intel phi back

Board: Home Board index Raytracing Visuals, Tools, Demos & Sources

(L) [2013/09/10] [mpeterson] [Re: giga rays on intel phi] Wayback!

yes, we plan to have 5-6grays. if needed we will use 4 or more mics/gpus per system but just for comparision.
the final architecture will be based on tensilica procs (why ever, i dont judge this). to be competitive i have to see
what current mass-market components can do and extrapolate this into the future. money doesnt matter so far.
the car scene is commercial i got told => not allowed to distribute any geometry/texture data. so i tried good old sponza:
for simple pt (10 bounces) i get around 162mrays/s. that is nearly a 10x degradation considering that i can raycast sponza
with primary rays < 1ms at 1024x1024 ! open sky scenes are much better (clear). car scene on a plane + sky hdr at around
300-500 mryas/s. so i think i need 10-12 accelerators.
titan just arrived. lets see what is possible here. we use Timo Aila et al. traversal/intersection without the sorting crap to
have a fair comparision.
first results:
primary/coherent rays: titan is about 2x behind.
pt on cornell and sponza (max 10 diff. bounces): both are more or less on a par.
the titan is much faster than the gtx480 and cheaper than the phi. on the other side the phi has 16gb and i think
that makes the price difference. in terms of io f.e. rdma the phi is far ahead allowing to build up clouds/clusters
that can mount filesystems etc. and can work close together on distributed pci-e networks. i think this is what we
need to reach the 5-6grays. except some "crapy" mpi there is nothing like that possible on gpus yet. and if we can trust intel
we see the phi as a standard cpu in 1-2 years. avoiding accelerators at all is always a good thing.
mp
(L) [2013/09/10] [beason] [Re: giga rays on intel phi] Wayback!

>> and if we can trust intel we see the phi as a standard cpu in 1-2 years
What does this mean? Will the Phi be a drop-in replacement for regular CPUs or something else? (Do you have a link?)
(L) [2013/09/11] [mpeterson] [Re: giga rays on intel phi] Wayback!

>> beason wrote:What does this mean? Will the Phi be a drop-in replacement for regular CPUs or something else? (Do you have a link?)
regular cpu replacement (look for kinghts landing/14nm broadwell roadmap, end 2014,1q 2015).
i know two upcoming top10 supercomputers that will use these cpus without pci-e accl.
(L) [2013/09/12] [Dade] [Re: giga rays on intel phi] Wayback!

>> mpeterson wrote:regular cpu replacement (look for kinghts landing/14nm broadwell roadmap, end 2014,1q 2015).
i know two upcoming top10 supercomputers that will use these cpus without pci-e accl.
Interesting, does anyone know if Xeon Phi has some kind memory/cache coherency support in order to share a single pool of memory across multiple Xeon Phi ?
(L) [2013/09/13] [mpeterson] [Re: giga rays on intel phi] Wayback!

ray distribution tests in outside environments:
AO rays with around 720mrays/s (tracing 16rays per octant together)
[IMG #1 Image]
Diff.  bounces (depth 8) with around 490mrays/s
[IMG #2 Image]
for scenes like that ao even looks better to me, mp.
[IMG #1]:Not scraped: https://web.archive.org/web/20150330165831im_/http://s9.postimg.org/hd8089b8r/r8_ao.jpg
[IMG #2]:Not scraped: https://web.archive.org/web/20150330165831im_/http://s9.postimg.org/tqkuf60x7/r8_diff.jpg
(L) [2013/09/13] [graphicsMan] [Re: giga rays on intel phi] Wayback!

Just curious... if you run 8 bounces in Sibenik Cathedral or something with even more occlusion/visibility complexity, what kind of ray throughput do you see?  Any luck getting numbers for Titan yet?
(L) [2013/09/13] [mpeterson] [Re: giga rays on intel phi] Wayback!

>> graphicsMan wrote:Just curious... if you run 8 bounces in Sibenik Cathedral or something with even more occlusion/visibility complexity, what kind of ray throughput do you see?  Any luck getting numbers for Titan yet?

yes titan (serveral) are in place. as i said above, both are more or less equal in performing diffuse bounces (around 165mray/s).
for primary rays mic is 2x ahead. this is because i have optimized kernels for that. on titan i use the optimized bvh2 kernels
for kepler + woop triangle test for any typ of ray. when it comes to opengl/frame-display/post-processing it is much better with the gpu only
solution on a workstation. still open what i will do...
(L) [2013/09/14] [Dade] [Re: giga rays on intel phi] Wayback!

>> mpeterson wrote:yes titan (serveral) are in place. as i said above, both are more or less equal in performing diffuse bounces (around 165mray/s).
for primary rays mic is 2x ahead.
Strange, I would have expected exactly the opposite result  [SMILEY :?:] I mean, MIC should be less sensible to thread divergence. May be cache, play an important role here  [SMILEY :?:]
(L) [2013/09/14] [hobold] [Re: giga rays on intel phi] Wayback!

>> Dade wrote:Strange, I would have expected exactly the opposite result   I mean, MIC should be less sensible to thread divergence. May be cache, play an important role here  Or maybe raw memory bandwidth, and particularly scatter/gather performance, could be the relevant bottleneck here. As far as I know, Titan has much more peak RAM bandwidth, and is significantly more aggressive in parallelizing divergent memory accesses than Phi.
(L) [2013/09/14] [graphicsMan] [Re: giga rays on intel phi] Wayback!

>> Dade wrote:mpeterson wrote:yes titan (serveral) are in place. as i said above, both are more or less equal in performing diffuse bounces (around 165mray/s).
for primary rays mic is 2x ahead.
Strange, I would have expected exactly the opposite result   I mean, MIC should be less sensible to thread divergence. May be cache, play an important role here  
I think it depends on what you mean by "thread".  If you are talking SIMD lanes, then I'd have to say it's probably worse for thread divergence than a GPU.  GPUs are built for SIMT with the expectation that you'll have divergence.  MIC is really traditional SIMD, but with added scatter/gather functions.  There are also fewer *actual* threads to hide divergent load latencies.
(L) [2013/09/14] [tby Zelcious] [Re: giga rays on intel phi] Wayback!

What percentage of the inner loops use double precision? Can you give a rough estimate?
(L) [2013/09/14] [tby mpeterson] [Re: giga rays on intel phi] Wayback!

keep in mind that the titan has 2x the horsepower of phi. so phi
is not a bad platform for rt/pt.
(L) [2013/09/16] [tby graphicsMan] [Re: giga rays on intel phi] Wayback!

Yeah, raw FLOPS definitely has little to do with how well a processor will compute specific tasks.  I'll be interested to see how these two processors compete as you develop similar specialized kernels for the Titan platform.  I think a key thing about Phi is that it has more memory than GPUs have, and going forward memory will likely be a main factor for which platform to choose.
(L) [2013/09/16] [tby Dade] [Re: giga rays on intel phi] Wayback!

>> graphicsMan wrote:I think a key thing about Phi is that it has more memory than GPUs have, and going forward memory will likely be a main factor for which platform to choose.
There is a Quadro card with 12GB of ram, what is the larger memory configuration available for Xeon Phi ?
(L) [2013/09/16] [tby voxelium] [Re: giga rays on intel phi] Wayback!

>> Dade wrote:There is a Quadro card with 12GB of ram, what is the larger memory configuration available for Xeon Phi ?
16 GB: [LINK http://ark.intel.com/products/75799/Intel-Xeon-Phi-Coprocessor-7120P-16GB-1_238-GHz-61-core]

back