Specular Reflection Banding... back
(L) [2007/10/23] [moogie] [Specular Reflection Banding...] Wayback!I have been slowly plugging away at my real time ray tracer but have hit a snag...
There seems to be banding on spheres when the specular highlight area nears the edge of a sphere.
[IMG #1 ]
I am not sure what would be causing this and am wondering if anyone more experienced in raytracing quirks could suggest what might be causing it.
It is easier to see the banding in a dynamic scene... to this end i have made a test Applet to demonstrate the problem (needs java 1.5+):
[LINK http://javaunlimited.net/hosted/moogie/testApplet.html]
The applet is 640x480 with 97 object visible out of 256 objects running at 38fps on an althon XP2800 using java 6.
I do recommend using java 6 as it gives a 20% increase over java 5 for this ray tracer!
I do not yet have a acceleration structure in place however i will eventually put in a BVH.
It is multithread so those of you who have modern two or more core cpus should have a significant speed boost.
To get FPS you will need to download the self executable jar:
[LINK http://javaunlimited.net/hosted/moogie/jrtrt_specular.jar]
I would also love to see what sort of frame rates other people achieve with what systems and java versions.
[IMG #1]:Not scraped:
https://web.archive.org/web/20071025011130im_/http://javaunlimited.net/hosted/moogie/specular.png
(L) [2007/10/23] [Ho Ho] [Specular Reflection Banding...] Wayback!dualcore P4 2.66GHz sitting on a dead-slow 533MHz FSB and 200MHz DDR2 gets around 32-35FPS with Java 1.6.
Basically P4 cannot get any worse than that unless underclocking it [SMILEY Smile]
_________________
In theory, there is no difference between theory and practice. But, in practice, there is.
Jan L.A. van de Snepscheut
(L) [2007/10/23] [lycium] [Specular Reflection Banding...] Wayback!i get between 90 and 120 fps, but the cpu cores aren't getting fully utilised. intel q6600, 4gb ddr800.
about the reflection, post some code? i'm guessing it's standard phong.
(L) [2007/10/23] [moogie] [Specular Reflection Banding...] Wayback!hmm that is odd that it is not utilising the cores effectively... a core duo laptop it was achieving 90% utilisation.
I can modify the code so that you can specify the number of threads to spawn... currently it creates a thread for each core/cpu.
What platform are you running? linux/win/mac? perhaps the code i use to detect the number of cores is not correctly reporting the number of cores...
here is my directional light source code... it is pretty horrible at the moment:
(L) [2007/10/23] [lycium] [Specular Reflection Banding...] Wayback!hmm that specular stuff is definitely incorrect, you have to raise the dot product of the reflection vector and the viewing vector to some power.
about the utilisation, yeah it's somewhere between 70% and 80% of 4 cores, if that's normal then no probs :) i was assuming it would use 100%.
(L) [2007/10/23] [lycium] [Specular Reflection Banding...] Wayback!oh, i was looking too early in the method - you're re-using tempdouble... java speedcoding tricks ;)
in that case, sorry, i can't imagine where the problem might be (i haven't checked the reflection vector computation but i'm pretty sure you'd get that right! ;)
(L) [2007/10/23] [madd] [Specular Reflection Banding...] Wayback!Haven't looked at your code, but "banding" when using simple phong illum model is not necessarily an error.
Then reason is that the glossy term may be non-zero as N dot L becomes 0, so you'll get a discontinuity..
(L) [2007/10/23] [moogie] [Specular Reflection Banding...] Wayback!@madd: i think you are correct. When i increase the power from 4 to 100 i do not see any banding.
Here is the "corrected" version:
[LINK http://javaunlimited.net/hosted/moogie/jrtrt_v1.6_12_threads.jar]
This version also will use 12 threads, lycium can you see if you get close to 100% utilisation? this version cannot really be comparable to the other version as this one is using a math library function Math.pow which is slower. But if you are achieving a similar fps then I am quite happy with that!
(L) [2007/10/23] [lycium] [Specular Reflection Banding...] Wayback!nope, it's actually lower now, pretty much pegged at 65%. more threads != more speed, in fact i think it's synchronisation overhead that's responsible for the inefficiency.
(L) [2007/10/23] [moogie] [Specular Reflection Banding...] Wayback!Synchronisation is a killer indeed... I think i might see if i can get away without synchronisation without noticable artifacts.
(L) [2007/10/23] [lycium] [Specular Reflection Banding...] Wayback!just having a semaphore on your list of screen tiles to be rendered should be enough i think?
if you don't feel like messing around with that, you can always just eat up more cycles ;) throw some antialiasing at it, specular reflections, ...
(L) [2007/10/23] [moogie] [Specular Reflection Banding...] Wayback!i have two semaphores:
1. world state update semaphore
2. render semaphore
each thread will wait until the world state has been updated and then render their tiles and then wait until all threads have finished and then the designated master thread will draw the rendered image and update the world state. repeat until stopped.
Well if there are going to be spare cycles i could use them to re-build a BVH in the background [SMILEY Smile]
(L) [2007/10/23] [moogie] [Specular Reflection Banding...] Wayback!Well that is a little depressing... I must admit i am quite inexperienced with multi threaded programming especially with respect to obtaining high cpu usage.
I will have to research into techniques to reduce wasted cycles.
Thanks for testing it.
A question: are the results similar for the "original" version in the first post? or similar to Lycium, does it achieve better utilisation?
(L) [2007/10/24] [Zakalwe] [Specular Reflection Banding...] Wayback!I won't be able to post results until tomorrow when I get back into work. Perhaps in the meantime if you post your boss/worker thread code we might see some problems. Perhaps it's something simple like doing too much work in the boss thread so the workers sit around doing nothing. Maybe setting up the jobs is too complicated? What I do for tiling is use a single integer to represent all the jobs.
Let's say the screen is 512x512 and the size of a tile is 32x32. There are therefore 256 jobs.
All you do is wrap an int called "jobs" in a single mutex.
worker pseudo code;
lock(jobs)
jobid = jobs--;
unlock(jobs)
extents of tile are minx,miny to maxx,maxy
#32 is the size of a tile
#512 is the height and width of the image plane
miny = (jobid/512) * 32
minx = (jobid%512) * 32
maxx = minx + 32
maxy = miny + 32
Now all the boss has to do while creating jobs is to set jobs = 256. You'll need some condition variable code for jobs, but that's easy enough.
Tile size affects rendering speed too. There's a tradeoff between the fineness of the load balancing and th overhead of all the locking. Experiment a bit. I found 32x32 was best for my data sets overall (8 cores, 512x512 image, 1 ray per pixel)
[IMG #1 ]
[IMG #1]:Not scraped:
https://web.archive.org/web/20071025011130im_/http://img99.imageshack.us/img99/9809/tilesizezm0.png
(L) [2007/10/24] [moogie] [Specular Reflection Banding...] Wayback!At a high level the ray tracer logic flows in the following steps:
1. obtain the number of cpus/cores
2. create a render thread for each core
3. nominate one thread to the be the master thread
4. split the screen into tiles and give each thread an equal portion of the tiles
5. All render threads apart from the master thread wait by call await() on the "state" CyclicBarrier
6. master thread updates world state and calls await() on the "state" CyclicBarrier awaking all threads.
7. all threads render their tiles and then wait by call await() on the "render" CyclicBarrier.
8. once all threads have called await() on the "render" CyclicBarrier all threads wake and the non-master threads goto step 5.
9. the master thread displays the rendered image and then goes to step 6.'
This is the "master" thread code:
(L) [2007/10/24] [Zakalwe] [Specular Reflection Banding...] Wayback!The original performs better for me with speeds ranging from 110 to 150 fps. Htop still shows nowhere near all of the CPU power being used
Tile size is not going to do anything for you until you get rid of the static allocation of the tiles to the worker threads. Using more threads than there is CPUs causes unwanted contention and is not a proper substitute for real load balancing.
Another problem may be the platform itself. It's possible that the overhead of java and java threads are killing you. I use C++ in Linux with Pthreads as Linux has very fast pthreads support via NLTPS and I get much better processor utilization. Also, one of the largest breakthroughs in ray tracing in real time has been packet tracing. Unless things have changed since I last programmed in java, you're not going to be able to access SIMD registers from it, cutting you off from that lovely free speedup packet tracing gives us.
(L) [2007/10/24] [moogie] [Specular Reflection Banding...] Wayback!Thanks for testing it.
I have an understanding of c++ however definetely not to the level needed to capitalise on such things as SIMD registers. I am however quite proficient at java. It might well be that I am reaching the limits of the language.
The goal of this ray tracer is to be able to provide a 3D plugin for websites. With java being pretty ubiquitous it seems a valid choice.
I have no doubts about my level of skill in the area of ray tracing ( i.e. lack of [SMILEY Razz]) and do not expect to be able to develop a real-time ray tracer that can compete with other projects being developed by the talent on this forum. I do however aim to achive 80% of the performance of the other rtrt being developed.
I agree, the i need to remove the static allocation of tiles. To this end i am planning to implement a scheme similar to yours and have a stack of tiles which will be popped by the worker threads. Which theoretically should achieve load balance.
There are many other algorithm optimisations I have thought up and given time will be implemented.
back