Tech Talk: A brief history of GPGPU

Tech Talk: A brief history of GPGPU

In the past, you could only use video cards for 2D/3D desktop applications and games, but it soon became clear that these cards could also be used for scientific applications to process large amounts of (floating point) data in parallel and in high speed.

The drivers of video cards were closed sourced, and briefly documented (to big frustration of Linus Torvalds) and therefore practically inaccessible to third parties. After 2001, the shaders were programmable and thereby simple operations on datasets were possible through shader-languages ​​and (macro) assembly languages​. Unfortunately, just a part of the functionality of a video card was accessible, so you were not yet able to implement complex editing and algorithms, it was supplier specific (Nvidia, AMD, Intel etc) and debugging was very difficult.

With the rise of software libraries such as DirectX and OpenGL, video cards became increasingly suitable for generic applications and also increasingly easier to program, although the emphasis of these libraries lies mainly in 3D visualization and raytracing.

In 2007 Nvidia came up with an API so that software developers and engineers could gain full access to the hardware. OpenCL (Open Computing Language) was developed in 2009 by Khronos Group, with which your platform can get access to CPUs, GPUs, DSPs and FPGAs independently for different platforms. This independence is also very relevant for industrial applications. Graphic cards have a life cycle of 2 to 3 years and you want to prevent that you have to start a whole new development process when you use new video card. Nowadays there are even video cards that have reached the 1 TFlops fast enough  to inspect a product in line and in real time.

senseIT is currently working on the upgrade of our current 3D fringe camera. We developed this technology based on a DSP chip, but we’re facing difficulties because of the limitations that this platform has. 2 cameras that stream each with 5Mpixels @ 60Hz stream deliver data that needs to be processed in a short time. Besides, we have inspection cells that use 3 fringe cameras! Although decoding of projector pixels still happens via a post-processing DSP, all floating point intensive calculations are now done using a graphic card with 2560 Cuda Cores. Due to the parallelisation (and fast memory) in certain cases you can book a speed gain of 100x the speed of a standard I7 CPU!

After individual point clouds have been generated, they must be stitched, aligned and edited. Since a 3D object can easily be > 1GB and since you want to achieve results within 5 to 10 seconds, you cannot avoid to use specialized acceleration platforms. Even so, optimization remains a separate discipline and it’s situation specific whether SIMD is needed for optimazation (mainly for integer operations), or whether it is worth considering using a GPU.