Unleash Your Inner Supercomputer

Your Guide to GPGPU with WebGL

The GPU provides nearly ubiquitous powerful parallel compute capabilities. The knowledge to make use of these capabilities however, is far from ubiquitous. Built around rendering pixels to the screen, the programming model and architecture are well outside most programmer's experience. In what follows we provide detailed explanations and examples of programming idioms common in developing computational models and programs suitable for harnessing the GPU for general purpose computations, GPGPU.

Super speed

Quite simply, it's all about performance. We will learn a new way of thinking about computational problems. A way of thinking that is more work, but that gives us access to design elements - simulations and computations - that would otherwise be unreachable. We will take the graphics hardware driven by the billion dollar gaming industry and use it in an entirely different way.

GPUs are so effective that an entire ecosystem of specialized languages and environments has grown up around general-purpose computing on graphics processing units (GPGPU) focusing on high performance code. We, though, will focus on portability and instructional design, limiting our discussion to techniques available from within a web browser.

GPU performance far outpaces CPU performance, and the gap is getting larger!

The performance shown is for single precision, 32 bit floating point numbers. The GPU deals extremely well with these numbers. However, using higher precision numbers, 64 bit floating point, dramatically decreases performance. Indeed, these higher precision numbers are not widely available through WebGL or OpenGL. Their use is more common with specialized tools such as OpenCL or CUDA. This 32 bit precision is a great fit for many applications, especially in instruction.

Wait, What? WebGL Computations?

WebGL is for graphics, what do we even mean when we talk about GPGPU?

Modern computers use small programs, called shaders, to compute the color of each pixel on the screen. Moreover, computers have specialized hardware, graphics processing units (GPUs), to run large numbers of these shaders in parallel. Computer games depend on this both for a high frame rate, and for many effects such as lighting and shadows. All this parallelism and performance, though, comes at a cost. The output of each shader is completely determined by its input. There is no communication between shaders while they execute. This makes sense in a graphics context where the color of each pixel has no dependency on the neighboring pixels, the color depends only on the scene being rendered. However, from the viewpoint of using the GPU for computations, this is a significant constraint.

Fortunately, there are a large number of problems in the sciences, engineering, and mathematics that are addressable

On a grid.
Where calculations at one point on the grid do not depend on the ongoing calculations at other points.

WebGL vertices build triangles, which
are filled in by fragments.

That look just like a grid for
numerically solving a
differential equation.

Shaders were introduced into mainstream graphics in early 2001, and were almost immediately adopted for use beyond graphics. We will see that shaders are a great fit for numeric calculations on a mesh or grid. Many other computing models have also been mapped onto graphics hardware. Examples include differential equations, financial modeling, machine learning, image processing, and even databases.

How do we do it?

What are the main elements that map a problem onto a GPU? GPUs are designed for graphics, so it will take some effort to wrap our minds around using them in a wider context. As we will see, for certain problems it is well worth the effort.

The basic structure of WebGL GPGPU implementations.

WebGL GPGPU implementations share a common structure. This is the result of fitting the problem into a form that can be readily addressed using graphics hardware and WebGL. We will start with a general description, then walk through concrete examples to provide a strong introduction to using GPUs for computation.

The Canvas

As always with WebGL, we start by creating a canvas. There are a couple of things that stand out about a GPGPU canvas. First, the canvas is sized to fit the problem. If we are solving a differential equation on a 128x128 grid, create a 128x128 (width x height) canvas.

Second, we don't have to attach the canvas to the DOM, simply create the canvas and use it to get a WebGL context. We only attach it to the DOM if we use it to visualize our results.

The Geometry

For computing we want the simplest geometry possible. The only purpose of the vertices and geometry is to generate calls to the fragment shader at every point on the canvas. The geometry is almost always two triangles that cover the canvas as shown in the figure.

The Fragments

The simple geometry is rasterized. It is divided into fragments corresponding to the pixels on the original canvas. We use a simple geometry with no projection so that the mapping from fragments to pixels is direct.

Input Texture

Textures are data storage on the GPU. For our GPGPU work, it is easiest to think of them as 2D arrays, where we want one element of the array for each pixel of the canvas.

We create a texture sized exactly as the canvas. So with a 128x128 problem grid, we have a 128x128 canvas, and a 128x128 texture. Remember that textures use normalized coordinates. We attach the (0, 0) point to one corner of the geometry, and the (1, 1) point to the opposite corner. This completely covers the geometry, and hence the canvas, with the texture.

Code

Now things get really interesting. When you do a rendering step, the fragment shader is invoked for every fragment in the geometry to produce a pixel value gl_FragColor. This is the result of the computation at that point on the grid. The shader reads values from the input texture and the gl_FragColor value is loaded into the output texture.

Output Texture

Drawing the fragment shader result to the screen would not be very useful for computations. Instead we write it to another texture. A texture used this way, as a target for off-screen rendering, is referred to as a framebuffer object, or FBO.

Our output texture also has the same dimensions as the canvas and the input texture. Indeed, it is common practice to do time evolution by swapping the input and output textures and doing another render step.

You might be asking "Where does the first texture come from?" There are two possibilities. For the first, we create the texture directly from a fragment shader. Think about the diagram without the input texture, where the fragment shader computes gl_FragColor using only the (x, y) coordinates as input. A second option is to compute the values outside of OpenGL, and specify them in a data array when the texture is created. This second is similar to loading image data into a texture. We will show examples of both approaches.

Speed Bumps

For the most permissive OpenGL ES implementations, this technique is surprisingly straight forward. However, we are using OpenGL far away from it's design purpose so we expect some speed bumps. This is particularly true with mobile devices such as cell phones and tablets, which often have limited abilities to read and write textures. GPGPU is still possible, but we will introduce additional techniques for representing data in textures that these mobile platforms can work with.

We will work through a first example of matrix multiplication for a highly capable desktop, then show the necessary steps to adapt the code to less pliant, mobile, platforms. This is to illustrate the general approach with as few complications as possible.