Your Guide to WebGPU Compute Shaders
The GPU provides nearly ubiquitous powerful parallel compute capabilities. The knowledge to make use of these capabilities however, is far from ubiquitous. Built around rendering pixels to the screen, the programming model and architecture are well outside most programmer's experience. In what follows we provide detailed explanations and examples of programming idioms common in developing computational models and programs suitable for harnessing the GPU for general purpose computations.
Super speed
Quite simply, it's all about performance. We will learn a new way of thinking about computational problems. A way of thinking that is more work, but that gives us access to design elements - simulations and computations - that would otherwise be unreachable. We will take the graphics hardware driven by the billion dollar gaming industry and use it in an entirely different way.
GPUs are so effective that an entire ecosystem of specialized languages and environments has grown up around general-purpose computing on graphics processing units (GPGPU) focusing on high performance code. We, though, will focus on portability and instructional design, limiting our discussion to techniques available from within a web browser.
Both the CPU and GPU performance curves start around 1 giga floating-point operations per second (GFLOPS), however, they rapidly diverge. By 2014 the GPU has reached over 5300 GFLOPS, while the CPU has only obtained 700 GFLOPS.
For current generation processors, the Intel 285K reaches 224 GFLOPS, and an RTX 5090 reaches 104.8 tera floating-point operations per second (TFLOPS), that's a potential for a 460-fold performance improvement by leveraging a GPU. Not only do we get a performance boost, we move those computations off of the CPU, leaving it free to respond to user actions.
This is single precision, 32 bit, floating point performance. The GPU deals extremely well with these numbers. However, using higher precision numbers, such as 64 bit floating point, dramatically decreases performance. Indeed, these higher precision numbers are not widely available through WebGPU or Vulkan. Their use is more common with specialized tools such as OpenCL or CUDA. Even there it has a significant performance cost. This 32 bit precision is a great fit for many applications, especially in instruction.
Wait, What? Computations on the GPU?
The GPU is for graphics, what do we even mean when we talk about computations on the GPU?
Modern computers use small programs, called shaders, to compute the color of each pixel on the screen. Moreover, computers have specialized hardware, graphics processing units (GPUs), to run large numbers of these shaders in parallel. Computer games depend on this both for a high frame rate, and for many effects such as lighting and shadows. We can bring all this parallelism and performance to bear on our problems as long as we can make them look similar to computing pixels for the screen, that is if we can arrange the computations on a grid. Fortunately, there are a large number of problems in the sciences, engineering, and mathematics that are addressable on a grid.
are filled in by fragments.
numerically solving a
differential equation.
Shaders were introduced into mainstream graphics in early 2001, and were almost immediately adopted for use beyond graphics. Dedicated compute shaders were introduced in 2006, and now in 2024 have made their way into the newest Web graphics API, WebGPU. Compute shaders are a significant step forward and provide a much clearer path than leveraging graphics shaders for compute tasks. We will see that shaders are a great fit for numeric calculations on a mesh or grid. Many other computing models have also been mapped onto graphics hardware. Examples include differential equations, financial modeling, machine learning, image processing, and even databases.
How do we do it?
What are the main elements that map a problem onto a GPU? GPUs are designed for graphics, so it will take some effort to wrap our minds around using them in a wider context. As we will see, for certain problems it is well worth the effort.
In the most straightforward approach, the compute shader reads values from an input array and computes a single result. This parallels the original graphics approach where a fragment shader reads a texture and other input values and produces the fragment color. We will start with some general descriptions, then walk through concrete examples to provide a strong introduction to using GPUs for computation.
The Data
Buffers are the primary means to exchange data with compute shaders. Buffers are blocks of memory on the GPU. They are created from a GPUDevice with GPUDevice.createBuffer. These buffers can also be mapped to system memory so that they can be read or written from our CPU side code.
The Code
Compute shaders can loosely be thought of as fragment shaders on steroids. Where a fragment shader would be invoked once per fragment, we invoke the compute shader once per result element. For example consider matrix multiplication.
We invoke the computer shader for each .
The Commands
The last step is to accumulate commands for the GPU. These commends tell the GPU which shader to execute and which resources to use. The commands are then submitted to the GPU for execution.