Implementing The Stages
In the last section we decomposed the general WebGPU compute process into three bite sized concepts. Now, we will implement a couple of them. This simplified example illustrates the use of buffers and commands to set up and copy data from input to output arrays.
Along the way, we collect some general methods into the WebgpuCompute class, and use them in the CopyBuffer class to implement our example.
Is WebGPU supported
WebGPU support is still spinning up, so we need to be very pedantic here because it may
not be supported, or not be fully supported on any platform we try, especially mobile.
navigator.gpu
is the entry point for the WebGPU API, so the first step
is to check if navigator.gpu
is present.
if (!navigator.gpu) {
throw Error("WebGPU is not supported.");
}
The specific application will catch this error and handle it as appropriate. Common approaches include popping up a user alert, or at least logging the error.
Get an Adapter
Now that we know WebGPU is supported, the next step is to get an adapter. The WebGPU adapter
corresponds roughly a physical GPU. Many systems have two physical GPUs. One power
efficient GPU built into the processor, and a second, high performance and high power, discrete
GPU. When we request
an adapter we can specify low power
or high performance
options.
let adapter = await navigator.gpu.requestAdapter(options);
if (!adapter) {
if (options) {
throw Error("Request WebGPU adapter failed with options: " + JSON.stringify(options));
} else {
throw Error("Request WebGPU adapter failed.");
}
}
Laptops, desktops, and mobile devices have a wide range of graphics adapters with a correspondingly
wide range of capabilities. WebGPU encapsulates this variability through optional features
and limits
. All implementations of WebGPU have a baseline of capabilities, with
optional features and resource limits that can be requested. If no optional capabilities are
requested, then you receive the base set of capabilities. This is a boon for compatability,
a project that works without requesting additional resources should work on most if not all
WebGPU capable systems.
Resource Limits
Resource limits are one of the optional capabilities that can be specified when acquiring a WebGPU device. There is a base level of resources that is expected to be available on all WebGPU systems. Most importantly, if we make the default requests we get this base set of resources, even on more capable systems. This is a boon for compatability, a project that works without requesting additional resources should work on most if not all WebGPU capable systems. If possible, we will avoid dependencies on optional limits or features.
We get the resource limits for a particular system from adapter.limits
.
adapter.limits
is an object where each property is the name of a limit,
and the value of the property is the limit on the system. We can then log the limits
for a system with:
for (const prop in adapter.limits) {
console.log(`${prop}: ${adapter.limits[prop]}`);
}
Or, we can use some slightly more elegant code to build a user-friendly table.
Limits for this system
Limit | Available Limit | Default Limit |
---|
Features
While limits vary even within a generation of graphics cards, features vary most strongly from generation to generation. Many optional features center around texture compression and data representation.
Getting the Device
Once we have a physical device, the adapter
, we get the
logical device. The WebGPU device
is a connection to the
adapter that manages and isolates the applications resources. All of
our interactions with WebGPU will be through the device.
When we get the device, we take only the default limits and features. The only option we specify is the label, to identify the source of any diagnostic messages.
device = await adapter.requestDevice({label: "Our compute device"});
device.addEventListener('uncapturederror', (event) => {
console.error(`Uncaught WebGPU error from ${device.label}: `, event.error);
});
Buffers
Fundamentally, buffers are blocks of memory on the GPU. Here, we allocate a buffer for a dataset, which in this example is a Float32Array.
This example sets all the available options for creating a buffer.
The label
identifies the buffer for use in diagnostic messages.
The size
is the number of bytes to be allocated to the buffer.
The usage
is bit flags that indicate the operations we plan to make with this buffer.
GPUBufferUsage.COPY_SRC
means that will copy data from the buffer.
A mapped buffer has a corresponding CPU side block of memory, which normally can be read
or written as we specify MAP_READ
or MAP_WRITE
for our usage
flags. mappedAtCreation
is a special case, where any buffer can be mapped
and written at its creation, streamlining the initialization process. So, we map and write
to this buffer even though the MAP_WRITE
flag is not set.
// Get a GPU buffer in a mapped state and an arrayBuffer for writing.
const gpuWriteBuffer = device.createBuffer({
label: "my compute input buffer",
size: dataset.byteLength,
usage: GPUBufferUsage.COPY_SRC
mappedAtCreation: true,
});
Initializing the Buffer
getMappedRange()
returns the contents of the buffer as an ArrayBuffer.
const arrayBuffer = gpuWriteBuffer.getMappedRange();
We wrap the ArrayBuffer with any of the JavaScript typed arrays, and assign values.
new Float32Array(arrayBuffer).set(dataset);
Finally, we unmap the buffer. Unmapping the buffer copies the data to the GPU and returns control of the buffer to the GPU.
gpuWriteBuffer.unmap();
We also need a buffer to read our data back from the GPU. This buffer is the same size as
the input buffer, but, as we might expect, the usage is different. COPY_DST
flags that this buffer is the destination for a copy operation. MAP_READ
flags that this will be mapped to CPU side memory for reading.
// Get a GPU buffer for reading in an unmapped state.
const gpuReadBuffer = device.createBuffer({
label: "my compute read buffer",
size: dataset.byteLength,
usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ
});
Commands
Now let's do something with these buffers. Specifically, we plan to copy data from the WriteBuffer to the ReadBuffer, then verify that we read back the original data. We construct a series of commands, then dispatch the commands from the CPU to the GPU.
We start with the command encoder, which allows us to build a list of commands to submit to the GPU. The only option available while creating the command encoder is the label, which we set.
const copyEncoder = device.createCommandEncoder({label: "GPU compute command encoder"});
Many commands, or sequences of commands, can be attached to a command encoder. This example
exercises a single command, copyBufferToBuffer
, which as the name implies,
copies the contents of one buffer to another.
We copy the contents of gpuWriteBuffer
to gpuReadBuffer
.
For each buffer we start at the beginning of the buffer, and copy the same count
of bytes we set in gpuWriteBuffer
. This operation works because we
set the usage for gpuWriteBuffer
and gpuReadBuffer
to
COPY_SRC
and COPY_DST
respectively.
copyEncoder.copyBufferToBuffer(
gpuWriteBuffer, // source buffer
0, // source offset
gpuReadBuffer, // destination buffer
0, // destination offset
dataset.byteLength, // byte count to be copied
);
Now that we have our commands, call finish
to get a
command buffer.
This command buffer is then submitted to the device queue.
const copyCommands = copyEncoder.finish({label: "GPU Compute Command Buffer"});
device.queue.submit([copyCommands]);
The queue copies the commands and any needed data to the GPU, where they will be executed as the gpu is able. Most applications have more and more complex commands. In these more complex cases the asynchronous nature of the queue shows a much greater value.
Retrieving Data
Once our computations are done, we need to retrieve data from the GPU.
mapAsync(GPUMapMode.READ)
returns a promise, which resolves when the
queued operations on the buffer are complete.
await gpuReadBuffer.mapAsync(GPUMapMode.READ);
const copyArrayBuffer = gpuReadBuffer.getMappedRange();
const readData = new Float32Array(copyArrayBuffer.slice());
