(L) [2016/07/11] [ost
by szellmann] [Map OpenGL depth buffer in CUDA kernel] Wayback!I'm cross-posting a question I already asked on the Nvidia developer forums [LINK https://devtalk.nvidia.com/default/topic/948593/cuda-programming-and-performance/map-opengl-depth-buffer-in-cuda-kernel/]. I thought that maybe someone here might have had similar problems in the past and would share his or her experience.
I have a volume rendering CUDA kernel that marches rays through a volume density. I want the rays to stop short where the depth buffer is already populated by fragments that were written to the depth buffer in a render pass before. I cannot assume anything about this depth buffer and especially I do not create it in the first place.
I therefore create an OpenGL PBO, glReadPixels() the depth buffer to that with GL_DEPTH_STENCIL (I optimize for depth24-stencil8 and accept that other cases are potentially slow). Then I register a CUDA graphics resource, obtain a device pointer from the mapped resource and pass it to my kernel.
This approach in general works but pixel transfer is achingly slow. GL_KHR_debug tells me that a device to host transfer is scheduled which explains this. However, the very same routine, but transferring the color buffer, is reasonably fast. Another thing I tried is first copying the depth buffer to the color buffer and then perform the read back on the color buffer. This is fast, provides me with the correct depth buffer, but leaves the color buffer invalid (no option for me).
I made a tiny example where you can play around with those modes: [LINK https://gist.github.com/szellmann/4a2f44f254af31e795c5b368d6f38423]
I'm confident that I could somehow use a 2nd pbo and transfer the depth buffer to this with GL_DEPTH_STENCIL_TO_RGBA_NV, and I'd also guess that this might not be to bad in terms of performance. But I'd rather avoid the 2nd transfer in terms of performance and code complexity. So has anyone managed to efficiently implement depth buffer interop between GL (default depth buffer) and CUDA already? I found no statement that this kind of transfer is not supported in the official docs, but if it wasn't I'd at least know that I have to hack my way around this [SMILEY :)]