Matrix Multiplication
We have already initialized our matrix. The simplest, but still interesting, operation we can do is to square the matrix.
Where is our matrix. Evaluating this sum will showcase techniques for working with data in textures.
Before we dive into the code, let's take another look at structure we expect for our GPGPU application.
We will build out this stack from the bottom up, starting, of course, with the canvas. We are computing the elements where so both and are matrices. The canvas will also be .
var gpgpUtility;
var matrixColumns;
var matrixRows;
matrixColumns = 128;
matrixRows = 128;
gpgpUtility = new vizit.utility.GPGPUtility(matrixColumns, matrixRows);
The
canvas.width
. Similarly the
canvas.height
. This lets us know some important things about the
texture.
The
Within the fragment shader we will have access to the texture coordinates, and from this we know exactly which texel we are writing to. Specifically,
The geometry is once again the standard triangles that completely cover the canvas. The fragments generated by this canvas + geometry combination creates one fragment for each element of the matrix.
We take the texture, m
that is the rendering target in the initialization step and
use it as the input for the matrix multiplication stage. Using a texture as a target in one step
and then as input in the next step is a common practice.
/**
* Runs the program to do the actual work. On exit the framebuffer &
* texture are populated with the square of the input matrix, m. Use
* gl.readPixels to retrieve texture values.
*
* @param m {WebGLTexture} A texture containing the elements of m.
* @param mSquared {WebGLTexture} A texture to be incorporated into a fbo,
* the target for our operations.
*/
this.square = function(m, mSquared)
{
var m2FrameBuffer;
// Create and bind a framebuffer
m2FrameBuffer = gpgpUtility.attachFrameBuffer(mSquared);
gl.useProgram(program);
gpgpUtility.getStandardVertices();
gl.vertexAttribPointer(positionHandle, 3, gl.FLOAT, gl.FALSE, 20, 0);
gl.vertexAttribPointer(textureCoordHandle, 2, gl.FLOAT, gl.FALSE, 20, 12);
gl.activeTexture(gl.TEXTURE0);
gl.bindTexture(gl.TEXTURE_2D, m);
gl.uniform1i(textureHandle, 0);
gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4);
};
We setup the geometry so that there is exactly one fragment for every element of
.
In the shader, the assignment to
maps to an assignment to gl_FragColor
, which sets the texel at (s
,
t
) in the m2
texture.
The references to
on the right side of the
equation will be represented by reads of the m
texture. For
and
we use the same s
and
t
for the texel we are computing, but what about
?
runs from where is the size of our matrix, 128 in this example. In terms texture coordinates, it runs from in steps of .
#ifdef GL_FRAGMENT_PRECISION_HIGH
precision highp float;
#else
precision mediump float;
#endif
uniform sampler2D m;
varying vec2 vTextureCoord;
void main()
{
float i, j;
float value = 0.0;
i = vTextureCoord.s;
j = vTextureCoord.t;
for(float k=0.0; k<128.0; ++k)
{
value += texture2D(m, vec2(i, k/128.0)).r * texture2D(m, vec2(k/128.0, j)).r;
}
gl_FragColor.r = value;
}
The bounds on the loop are fixed. This is, interestingly, a requirement for GLSL. Specifically, the version of GLSL used in WebGL requires that "the maximum number of iterations can easily be determined at compile time." We then have different shaders for different sized matrices.
Another approach is to have a very large loop index, and break out of the loop when you are done. Something like this
uniform sampler2D m;
uniform float mSize;
varying vec2 vTextureCoord;
void main()
{
float i, j;
float value = 0.0;
i = vTextureCoord.s;
j = vTextureCoord.t;
for(float k=0.0; k<2048.0; ++k)
{
if (k >= mSize)
{
break;
}
value += texture2D(m, vec2(i, k/mSize)).r * texture2D(m, vec2(k/mSize, j)).r;
}
gl_FragColor.r = value;
While this gives you a single shader that will work for a range of cases some problems remain. Many GPUs, especially older ones, will unroll the loop, and execute both the true and false paths for our condition, then discarding the unneeded results. This produces both unexpectedly large memory requirements, and poor execution time.
Question: How would you modify this algorithm to multiply two different matrices?
The results from running this code on this system are shown in the results table. The GPU results are also compared with the 64 bit results computed by JavaScript. In general these will not match.
This brings up an important point about GPU computations. For WebGL you will not see 64 bit (double precision) floating point numbers. At best you will see 32 bit (single precision) or even 24 or 16 bit floating point numbers. It is important to choose problems and examples where this level of accuracy is sufficient, as is the case with many instructional simulations.
