OpenCL Toys => SmallptGPU

MandelCPU vs MandelGPU

Written by David Bucciarelli




MandelGPU is a small and simple demo written in OpenCL in order to test the performance of this new standard Vs. MandelCPU. It has been written using the ATI OpenCL SDK beta4 on Linux but it should work on any platform/implementation (i.e. NVIDIA). Some discussion about this little toy can be found at Luxrender's forum

A video of Mandel GPU is available here (sorry for the low quality): http://vimeo.com/7876686

The following test has been done at 1024x768 while using a quite insane amount of iterations: 10000

History

  • V1.3 - Updated for ATI SDK 2.0

  • V1.2 - Jens's patch for MacOS, Szaq's patch for NVIDIA OpenCL and Windows, Fixed peformance estimation, added Windows binaries

  • V1.1 - Fixed window resize problem, added support for loading different kernels, added float4 kernel

  • V1.0 - First release

MandelCPU

This is just a simple mono-thread CPU implementation (no OpenCL involved). Result:

Rendering time: 9.630000 secs (Sample/sec 81665 Max. Iterations 10000)

MandelGPU on CPU device

This is the OpenCL implementation using only the CPU device. Result:

For test only: Expires on Sun Feb 28 00:00:00 2010
OpenCL Device 0: Type = TYPE_CPU
OpenCL Device 0: Name = Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
OpenCL Device 0: Compute units = 4
Reading file 'rendering_kernel.c' (size 996 bytes)
Rendering time: 9.800000 secs (Sample/sec 80248 Max. Iterations 10000)

It uses the 4 cores but it has the same performance of mandelCPU (with only one core). I guess CPU devices are useful only for developing purpose (i.e. when you don't have a fast GPU available).

MandelGPU (on GPU)

This is the OpenCL implementation using only the GPU device. Result:

For test only: Expires on Sun Feb 28 00:00:00 2010
OpenCL Device 0: Type = TYPE_GPU
OpenCL Device 0: Name = ATI RV770
OpenCL Device 0: Compute units = 10
Reading file 'rendering_kernel.c' (size 996 bytes)
Rendering time: 0.340000 secs (Sample/sec 2313035 Max. Iterations 10000)

It is about 38 time faster than the single-thread CPU implementation.

MandelGPU (on GPU with float4)

This is the OpenCL implementation using only the GPU device and vector type float4. Result:

For test only: Expires on Sun Feb 28 00:00:00 2010
OpenCL Device 0: Type = TYPE_GPU
OpenCL Device 0: Name = ATI RV770
OpenCL Device 0: Compute units = 10
OpenCL Device 0: Max. work group size = 256
Reading file 'rendering_kernel_float4.cl' (size 3354 bytes)
Rendering time: 0.160000 secs (Sample/sec 4915200 Max. Iterations 10000)

It is about 61 time faster than the single-thread CPU implementation.

How to compile

Just edit the Makefile and use an appropriate value for ATISTREAMSDKROOT.

Key bindings

  • 's' - save image.ppm

  • ESC or 'q' or 'Q' - exit

  • '+' - increase the max. interations by 32

  • '-' - decrease the max. interations by 32

  • Arrow keys - move left/right/up/down

  • PageUp and PageDown - to zoom in/out

  • ' ' - refresh the window

  • You can use the mouse button 0 and grab to move too

  • You can use the mouse button 2 and grab to scale too

Download: mandelgpu-v1.3.tgz (includes sources, Linux 64bit binaries and Windows 32bit binaries)