Matrix Multiplication

We have already initialized our matrix. The simplest, but still interesting, operation we can do is to square the matrix.

R_{i j} = M_{i j}^{2} = \sum_{k = 1}^{n} M_{i k} M_{k j}

Where $M$ is our $n \times n$ matrix. Evaluating this sum will showcase techniques for working with data in textures.

Before we dive into the code, let's take another look at structure we expect for our GPGPU application.

The basic structure of WebGL GPGPU implementations.

We will build out this stack from the bottom up, starting, of course, with the canvas. We are computing the $R_{i j}$ elements where $R = M^{2}$ so both $R$ and $M$ are $n \times n$ matrices. The canvas will also be $n \times n$ .


  var gpgpUtility;
  var matrixColumns;
  var matrixRows;

  matrixColumns = 128;
  matrixRows    = 128;
  gpgpUtility   = new vizit.utility.GPGPUtility(matrixColumns, matrixRows);

The x coordinate of the texture, s, varies from 0 to 1 while the canvas ranges from zero to canvas.width. Similarly the y coordinate of the texture varies from 0 to 1 while the canvas varies from zero to canvas.height. This lets us know some important things about the texture.

The x spacing between texels, δs , and the y spacing, δt , are

δ s = \frac{1}{canvas.width}

δ t = \frac{1}{canvas.height}

Within the fragment shader we will have access to the texture coordinates, and from this we know exactly which texel we are writing to. Specifically,

i = floor (canvas.width \times s)

j = floor (canvas.height \times t)

The geometry is once again the standard triangles that completely cover the canvas. The fragments generated by this canvas + geometry combination creates one fragment for each element of the $R$ matrix.

We take the texture, m that is the rendering target in the initialization step and use it as the input for the matrix multiplication stage. Using a texture as a target in one step and then as input in the next step is a common practice.


  /**
   * Runs the program to do the actual work. On exit the framebuffer &
   * texture are populated with the square of the input matrix, m. Use
   * gl.readPixels to retrieve texture values.
   *
   * @param m        {WebGLTexture} A texture containing the elements of m.
   * @param mSquared {WebGLTexture} A texture to be incorporated into a fbo,
   *                                the target for our operations.
   */
  this.square = function(m, mSquared)
  {
    var m2FrameBuffer;

    // Create and bind a framebuffer
    m2FrameBuffer = gpgpUtility.attachFrameBuffer(mSquared);

    gl.useProgram(program);

    gpgpUtility.getStandardVertices();

    gl.vertexAttribPointer(positionHandle,     3, gl.FLOAT, gl.FALSE, 20, 0);
    gl.vertexAttribPointer(textureCoordHandle, 2, gl.FLOAT, gl.FALSE, 20, 12);

    gl.activeTexture(gl.TEXTURE0);
    gl.bindTexture(gl.TEXTURE_2D, m);
    gl.uniform1i(textureHandle, 0);

    gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4);
  };

R_{i j} = M_{i j}^{2} = \sum_{k = 1}^{n} M_{i k} M_{k j}

We setup the geometry so that there is exactly one fragment for every element of $R_{i j}$ . In the shader, the assignment to $R_{i j}$ maps to an assignment to gl_FragColor, which sets the texel at (s, t) in the m2 texture.

The references to $M$ on the right side of the equation will be represented by reads of the m texture. For $i$ and $j$ we use the same s and t for the texel we are computing, but what about $k$ ?

$k$ runs from $0 \dots n$ where $n$ is the size of our matrix, 128 in this example. In terms texture coordinates, it runs from $0 \dots 1$ in steps of $\frac{1}{n}$ .


  #ifdef GL_FRAGMENT_PRECISION_HIGH
  precision highp float;
  #else
  precision mediump float;
  #endif

  uniform sampler2D m;

  varying vec2 vTextureCoord;

  void main()
  {
    float i, j;
    float value = 0.0;

    i = vTextureCoord.s;
    j = vTextureCoord.t;

    for(float k=0.0; k<128.0; ++k)
    {
      value += texture2D(m, vec2(i, k/128.0)).r * texture2D(m, vec2(k/128.0, j)).r;
    }
    gl_FragColor.r = value;
  }

The bounds on the loop are fixed. This is, interestingly, a requirement for GLSL. Specifically, the version of GLSL used in WebGL requires that "the maximum number of iterations can easily be determined at compile time." We then have different shaders for different sized matrices.

Another approach is to have a very large loop index, and break out of the loop when you are done. Something like this


  uniform sampler2D m;
  uniform float mSize;

  varying vec2 vTextureCoord;

  void main()
  {
    float i, j;
    float value = 0.0;

    i = vTextureCoord.s;
    j = vTextureCoord.t;

    for(float k=0.0; k<2048.0; ++k)
    {
      if (k >= mSize)
      {
        break;
      }
      value += texture2D(m, vec2(i, k/mSize)).r * texture2D(m, vec2(k/mSize, j)).r;
    }
    gl_FragColor.r = value;

While this gives you a single shader that will work for a range of cases some problems remain. Many GPUs, especially older ones, will unroll the loop, and execute both the true and false paths for our condition, then discarding the unneeded results. This produces both unexpectedly large memory requirements, and poor execution time.

Question: How would you modify this algorithm to multiply two different matrices?

The results from running this code on this system are shown in the results table. The GPU results are also compared with the 64 bit results computed by JavaScript. In general these will not match.

This brings up an important point about GPU computations. For WebGL you will not see 64 bit (double precision) floating point numbers. At best you will see 32 bit (single precision) or even 24 or 16 bit floating point numbers. It is important to choose problems and examples where this level of accuracy is sufficient, as is the case with many instructional simulations.