Mem / core clock mystery back
Board:
Home
Board index
Raytracing
Considered Harmfull
(L) [2011/12/23] [tby jbikker] [Mem / core clock mystery] Wayback!I was measuring some occupancy issues this afternoon, and stumbled upon this weird observation:
For a path tracing algorithm, which I assumed to be heavily mem bound (and all the signs point in this direction), I measured performance in three scenarios:
1. Everything normal: decent clock speeds for mem and core.
2. Core clock as far down as allowed. Mem clock normal.
3. Mem clock as low as possible. Core clock normal.
To my suprise, 1 and 3 produce virtually identical performance figures. 1 and 2 however are very different. The link between core clock speed and overall performance is actually almost linear.
Any ideas how this might be possible?
(L) [2011/12/23] [tby Dade] [Mem / core clock mystery] Wayback!Are we talking of CPUs (i.e. low mem bandwidth, a lot of cache) , old school GPUs (i.e. high mem bandwidth, no cache) or modern GPUs (i.e. some cache) ?
Anyway the core clock my have an influence on the cache speed (in the case there is one)  [SMILEY :idea:] [SMILEY :?:]
(L) [2011/12/23] [tby jbikker] [Mem / core clock mystery] Wayback!Ow sorry, completely forgot to mention. It's a GPU, Fermi, so it has cache. That could indeed be the problem. But I don't see how a cache could hide the mem underclock almost completely, in the context of path tracing?
(L) [2011/12/24] [tby jbarcz1] [Mem / core clock mystery] Wayback!>> jbikker wrote:Ow sorry, completely forgot to mention. It's a GPU, Fermi, so it has cache. That could indeed be the problem. But I don't see how a cache could hide the mem underclock almost completely, in the context of path tracing?
So, assuming your tools aren't lying about the memory clock, you must be getting terrific latency hiding...
I could see this happening if you have really good locality between parents and their children.  Each memory access might yank in a few nodes at a time, and several nodes' worth of compute could be enough to keep that thread running while another one is fetching.   Plus, the upper levels of the tree are basically free, since they're accessed all the time, so a miss at the bottom can be offset by another thread starting at the top.
How big's the scene?   And do you have a sense of what your occupancy levels are?
(L) [2012/01/09] [tby toxie] [Mem / core clock mystery] Wayback!Do you have a screenshot of the measured scene? Otherwise its difficult to tell..
back