Parallella: A Supercomputer For Everyone back

Board: Home Board index Raytracing Visuals, Tools, Demos & Sources

(L) [2012/10/01] [ost by Dade] [Parallella: A Supercomputer For Everyone] Wayback!

Not strictly related to Ompf but someone may be interested. There is a new project looking for money on Kickstart: [LINK http://www.kickstarter.com/projects/adapteva/parallella-a-supercomputer-for-everyone]

If you are bored to have to rewrite all your code in order to run on the GPUs, this may be a solution: no more problems with thread divergence, etc. Memory management will be the new nightmare (i.e. out-of-core rendering is pretty much required given the memory architecture).

P.S. if you have 25 years old Transputer card laying somewhere, you can ignore this post, it is about the same stuff  [SMILEY ;)]
(L) [2012/10/02] [ost by dr_eck] [Parallella: A Supercomputer For Everyone] Wayback!

From an EETimes article:   >> The 16-core version would deliver 26 GFLOPs, and the 64-core version could provide 90 GFLOPs—a significant leap over alternative boards such as Raspberry Pi.
That would put the 64 core version roughly equivalent to an Ivy Bridge Core i7.  Porting programs to the Intel chip is easier for me, but the cost per GFLOPS is a bit higher
Also worthy of note:
 >> A quad AMD 7970 desktop computer reaching 16TFlops of single-precision, 4 TFlops of double-precision computing performance. Total system cost was $3000; it was also built using only commercially available "gamer" grade hardware. (from Wikipedia's column on FLOPS)  44X the performance for 15X the cost.
(L) [2012/10/02] [ost by graphicsMan] [Parallella: A Supercomputer For Everyone] Wayback!

The programming difficulty for anything non-trivial should probably be considered for both of the high performance options [SMILEY :)]  Also, achieving close to max perf on an Intel chip is a lot easier than getting close to max perf on a GPU.  Unsure about the parallella machine.
(L) [2012/10/03] [ost by Dade] [Parallella: A Supercomputer For Everyone] Wayback!

>> graphicsMan wrote:Also, achieving close to max perf on an Intel chip is a lot easier than getting close to max perf on a GPU.  Unsure about the parallella machine.
I think this is the point, you have to worry only of data management on a MIMD, for everything else, each core is like a CPU. So if your application has a working set that fit the local memory, you will able to achieve max. perf. as on the CPU. GPUs instead are so sensitive to thread divergence, memory access patterns, memory bandwidth that achieving max. perf. is about never possible.

Parallella is supposed to be as well scalable as GPUs but about as easy to program like CPUs ... well, that is the theory, we will see if it works in practice.
(L) [2012/10/03] [ost by graphicsMan] [Parallella: A Supercomputer For Everyone] Wayback!

[SMILEY :)] Yes, with a RAM size of 1GB for a ton of MIMD cores, I somehow doubt that for our use case it will scale easily [SMILEY ;)]
(L) [2012/10/04] [ost by Dade] [Parallella: A Supercomputer For Everyone] Wayback!

>> graphicsMan wrote: Yes, with a RAM size of 1GB for a ton of MIMD cores, I somehow doubt that for our use case it will scale easily
Indeed, the current version has only 32kb of local ram per core (so 512kb when you aggregate all 16 cores ram), a 32bit address space and only 8GB/s of bandwidth for the global memory: it will barely run SmallPT with a couple of spheres.

However the die size is about 2.05mm^2, for comparison, i7 should have a die area of about 250mm^2 (depends from the model), AMD HD7970 (built on the same 28nm process) has an estimated die size 352mm^2. It is 170 times smaller (!) There is room to add a lot of local ram and cores.

back