Software & Apps

Wasm GC is not ready for realtime graphics – dthompson

Wasm GC is a nice thing that is now available in all major web browsers since slowpoke Safari/WebKit finally shipped it in December. It provides a hierarchy of allocated reference types and a set of instructions for using them. Wasm GC allows memory managed languages ​​to take advantage of advanced garbage collectors within web browser engines. It is now possible to implement a memory managed language without having to ship a GC inside the binary. The benefits are smaller binaries, better performance, and better integration with the host runtime.

However, Wasm GC has some serious drawbacks compared to linear memory. I enjoy playing with realtime graphics programming in my spare time, but I was disappointed to discover that Wasm GC is not really suitable for that right now. I decided to write this post because I wanted to see Wasm GC on more or less equal footing with linear memory when it comes to binary data manipulation.

Hello triangle

To begin with, let’s see what a “hello triangle”
WebGL demo like Wasm GC. I will use
Hootthe Scheme to Wasm compiler I’m working on, to build it.

Below is a Scheme program that declares imports for a subset of WebGL, HTML5 Canvas, and more. APIs required and then render a triangle:

(use-modules (hoot ffi))

(define-foreign get-element-by-id
  "document" "getElementById"
  (ref string) -> (ref null extern))

(define-foreign element-width
  "element" "width"
  (ref extern) -> i32)
(define-foreign element-height
  "element" "height"
  (ref extern) -> i32)

(define-foreign get-canvas-context
  "canvas" "getContext"
  (ref extern) (ref string) -> (ref null extern))

(define GL_VERTEX_SHADER 35633)
(define GL_FRAGMENT_SHADER 35632)
(define GL_COMPILE_STATUS 35713)
(define GL_LINK_STATUS 35714)
(define GL_ARRAY_BUFFER 34962)
(define GL_STATIC_DRAW 35044)
(define GL_COLOR_BUFFER_BIT 16384)
(define GL_TRIANGLES 4)
(define GL_FLOAT 5126)
(define-foreign gl-create-shader
  "gl" "createShader"
  (ref extern) i32 -> (ref extern))
(define-foreign gl-delete-shader
  "gl" "deleteShader"
  (ref extern) (ref extern) -> none)
(define-foreign gl-shader-source
  "gl" "shaderSource"
  (ref extern) (ref extern) (ref string) -> none)
(define-foreign gl-compile-shader
  "gl" "compileShader"
  (ref extern) (ref extern) -> none)
(define-foreign gl-get-shader-parameter
  "gl" "getShaderParameter"
  (ref extern) (ref extern) i32 -> i32)
(define-foreign gl-get-shader-info-log
  "gl" "getShaderInfoLog"
  (ref extern) (ref extern) -> (ref string))
(define-foreign gl-create-program
  "gl" "createProgram"
  (ref extern) -> (ref extern))
(define-foreign gl-delete-program
  "gl" "deleteProgram"
  (ref extern) (ref extern) -> none)
(define-foreign gl-attach-shader
  "gl" "attachShader"
  (ref extern) (ref extern) (ref extern) -> none)
(define-foreign gl-link-program
  "gl" "linkProgram"
  (ref extern) (ref extern) -> none)
(define-foreign gl-use-program
  "gl" "useProgram"
  (ref extern) (ref extern) -> none)
(define-foreign gl-get-program-parameter
  "gl" "getProgramParameter"
  (ref extern) (ref extern) i32 -> i32)
(define-foreign gl-get-program-info-log
  "gl" "getProgramInfoLog"
  (ref extern) (ref extern) -> (ref string))
(define-foreign gl-create-buffer
  "gl" "createBuffer"
  (ref extern) -> (ref extern))
(define-foreign gl-delete-buffer
  "gl" "deleteBuffer"
  (ref extern) (ref extern) -> (ref extern))
(define-foreign gl-bind-buffer
  "gl" "bindBuffer"
  (ref extern) i32 (ref extern) -> none)
(define-foreign gl-buffer-data
  "gl" "bufferData"
  (ref extern) i32 (ref eq) i32 -> none)
(define-foreign gl-enable-vertex-attrib-array
  "gl" "enableVertexAttribArray"
  (ref extern) i32 -> none)
(define-foreign gl-vertex-attrib-pointer
  "gl" "vertexAttribPointer"
  (ref extern) i32 i32 i32 i32 i32 i32 -> none)
(define-foreign gl-draw-arrays
  "gl" "drawArrays"
  (ref extern) i32 i32 i32 -> none)
(define-foreign gl-viewport
  "gl" "viewport"
  (ref extern) i32 i32 i32 i32 -> none)
(define-foreign gl-clear-color
  "gl" "clearColor"
  (ref extern) f64 f64 f64 f64 -> none)
(define-foreign gl-clear
  "gl" "clear"
  (ref extern) i32 -> none)

(define (compile-shader gl type source)
  (let ((shader (gl-create-shader gl type)))
    (gl-shader-source gl shader source)
    (gl-compile-shader gl shader)
    (unless (= (gl-get-shader-parameter gl shader GL_COMPILE_STATUS) 1)
      (let ((info (gl-get-shader-info-log gl shader)))
        (gl-delete-shader gl shader)
        (error "shader compilation failed" info)))
    shader))

(define (link-shader gl vertex-shader fragment-shader)
  (let ((program (gl-create-program gl)))
    (gl-attach-shader gl program vertex-shader)
    (gl-attach-shader gl program fragment-shader)
    (gl-link-program gl program)
    (unless (= (gl-get-program-parameter gl program GL_LINK_STATUS) 1)
      (let ((info (gl-get-program-info-log gl program)))
        (gl-delete-program gl program)
        (error "program linking failed" info)))
    program))

(define canvas (get-element-by-id "canvas"))
(define gl (get-canvas-context canvas "webgl"))
(when (external-null? gl)
  (error "unable to create WebGL context"))

(define vertex-shader-source
  "attribute vec2 position;
attribute vec3 color;
varying vec3 fragColor;

void main() {
  gl_Position = vec4(position, 0.0, 1.0);
  fragColor = color;
}")
(define fragment-shader-source
  "precision mediump float;

varying vec3 fragColor;

void main() {
  gl_FragColor = vec4(fragColor, 1);
}")
(define vertex-shader
  (compile-shader gl GL_VERTEX_SHADER vertex-shader-source))
(define fragment-shader
  (compile-shader gl GL_FRAGMENT_SHADER fragment-shader-source))
(define shader (link-shader gl vertex-shader fragment-shader))

(define stride (* 4 5))
(define buffer (gl-create-buffer gl))
(gl-bind-buffer gl GL_ARRAY_BUFFER buffer)
(gl-buffer-data gl GL_ARRAY_BUFFER
                #f32(-1.0 -1.0
                      1.0  0.0  0.0
                      1.0 -1.0
                      0.0  1.0  0.0
                      0.0  1.0
                      0.0  0.0  1.0)
                GL_STATIC_DRAW)

(gl-viewport gl 0 0 (element-width canvas) (element-height canvas))
(gl-clear gl GL_COLOR_BUFFER_BIT)
(gl-use-program gl shader)
(gl-enable-vertex-attrib-array gl 0)
(gl-vertex-attrib-pointer gl 0 2 GL_FLOAT 0 stride 0)
(gl-enable-vertex-attrib-array gl 1)
(gl-vertex-attrib-pointer gl 1 3 GL_FLOAT 0 stride 8)
(gl-draw-arrays gl GL_TRIANGLES 0 3)

Note that in Scheme, the equivalent of a Uint8Array is a
bytevector. Hoot uses a packed array, an (array i8) in particular, for the contents of a bytevector.

And here is the JavaScript code needed to boot the resulting Wasm binary:

window.addEventListener("load", async () => {
  function bytevectorToUint8Array(bv) {
    let len = reflect.bytevector_length(bv);
    let array = new Uint8Array(len);
    for (let i = 0; i < len; i++) {
      array(i) = reflect.bytevector_ref(bv, i);
    }
    return array;
  }

  let mod = await SchemeModule.fetch_and_instantiate("triangle.wasm", {
    reflect_wasm_dir: 'reflect-wasm',
    user_imports: {
      document: {
        getElementById: (id) => document.getElementById(id)
      },
      element: {
        width: (elem) => elem.width,
        height: (elem) => elem.height
      },
      canvas: {
        getContext: (elem, type) => elem.getContext(type)
      },
      gl: {
        createShader: (gl, type) => gl.createShader(type),
        deleteShader: (gl, shader) => gl.deleteShader(shader),
        shaderSource: (gl, shader, source) => gl.shaderSource(shader, source),
        compileShader: (gl, shader) => gl.compileShader(shader),
        getShaderParameter: (gl, shader, param) => gl.getShaderParameter(shader, param),
        getShaderInfoLog: (gl, shader) => gl.getShaderInfoLog(shader),
        createProgram: (gl, type) => gl.createProgram(type),
        deleteProgram: (gl, program) => gl.deleteProgram(program),
        attachShader: (gl, program, shader) => gl.attachShader(program, shader),
        linkProgram: (gl, program) => gl.linkProgram(program),
        useProgram: (gl, program) => gl.useProgram(program),
        getProgramParameter: (gl, program, param) => gl.getProgramParameter(program, param),
        getProgramInfoLog: (gl, program) => gl.getProgramInfoLog(program),
        createBuffer: (gl) => gl.createBuffer(),
        deleteBuffer: (gl, buffer) => gl.deleteBuffer(buffer),
        bindBuffer: (gl, target, buffer) => gl.bindBuffer(target, buffer),
        bufferData: (gl, buffer, data, usage) => {
          let bv = new Bytevector(reflect, data);
          gl.bufferData(buffer, bytevectorToUint8Array(bv), usage);
        },
        enableVertexAttribArray: (gl, index) => gl.enableVertexAttribArray(index),
        vertexAttribPointer: (gl, index, size, type, normalized, stride, offset) => {
          gl.vertexAttribPointer(index, size, type, normalized, stride, offset);
        },
        drawArrays: (gl, mode, first, count) => gl.drawArrays(mode, first, count),
        viewport: (gl, x, y, w, h) => gl.viewport(x, y, w, h),
        clearColor: (gl, r, g, b, a) => gl.clearColor(r, g, b, a),
        clear: (gl, mask) => gl.clear(mask)
      }
    }
  });
  let reflect = await mod.reflect({ reflect_wasm_dir: 'reflect-wasm' });
  let proc = new Procedure(reflect, mod.get_export("$load").value);
  proc.call();
});

Hello problems

There are two major performance issues with this program. One is visible in the source above, the other is hidden in the language implementation.

Stacked objects are opaque on the other side

The Wasm GC heap objects are opaque on the host. Likewise, objects gathered from the host are not transparent to the Wasm visitor. So the content of a
(array i8) object is not visible from JavaScript and inside a Uint8Array not visible from Wasm. This is a good security property in the general case, but it is a hindrance in this specific case.

Let’s say we have (array i8) full of vertex data that we want to put in the WebGL buffer. To do this, we need to make a call to JS->Wasm for each byte in the array and store it into a Uint8Array. This is the bytevectorToUint8Array function above is to do. Copying any significant amount of data per frame will degrade tank performance. Hope you don’t try to stream vertex data!

Compare the previous paragraph with Wasm linear memory. A
WebAssembly.Memory thing will be easily accessible from JavaScript
as a ArrayBuffer. To get a blob of vertex data from a memory object, you just need to know the byte offset and length and you’re good to go. There are many applications of Wasm linear memory using WebGL with success.

Manipulation of multi-byte binary data is inefficient

To read a multi-byte number such as an unsigned 32-bit integer from a
(array i8)you have to take each individual byte and combine them. Here’s a self-contained example that uses the Guile-flavored WAT format:

(module
 (type $bytevector (array i8))
 (data $init #u32(123456789))
 (func (export "main") (result i32)
       (local $a (ref $bytevector))
       (local.set $a (array.new_data $bytevector $init
                                     (i32.const 0)
                                     (i32.const 4)))
       (array.get_u $bytevector (local.get $a) (i32.const 0))
       (i32.shl (array.get_u $bytevector (local.get $a) (i32.const 1))
                (i32.const 8))
       (i32.or)
       (i32.shl (array.get_u $bytevector (local.get $a) (i32.const 2))
                (i32.const 16))
       (i32.or)
       (i32.shl (array.get_u $bytevector (local.get $a) (i32.const 3))
                (i32.const 24))
       (i32.or)))

In contrast, Wasm linear memory requires only one i32.load
instructions:

(module
 (memory 1)
 (func (export "main") (result i32)
       (i32.store (i32.const 0) (i32.const 123456789))
       (i32.load (i32.const 0))))

Easy peasy. Not only is it less code, it’s more efficient.

Unsatisfactory workarounds

There is no way to solve the multi-byte problem at the moment, but for byte access from JavaScript there are some things we can try to work with what is given to us. Spoiler alert: None of them are pleasant.

Use Uint8Array from host

This approach makes all binary operations from within the Wasm binary slow because we need to cross the Wasm->JS bridge for each read/write. Since most of the binary data manipulation happens in the Wasm module, this approach just slows things down in general.

Use linear memory for bytevectors

It will take a little malloc/free implement a method to reclaim memory for GC bytevectors. You can register each bytevector in a FinalizationRegistry to be notified by the GC and free the memory. Now you have to deal with memory fragmentation. This is Wasm GC, no need to do anything with it!

Use linear memory as scratch space

This avoids crossing the Wasm/JS boundary for each byte, but still includes a byte-by-byte copy from (array i8) in the linear memory within the Wasm module. Right now it feels like the least worst option, but the extra copy will still reduce throughput significantly.

Wasm GC needs some fixin’

I use realtime graphics as an example because this is a use case that is very sensitive to performance issues, but this unpleasant need to copy binary data byte-by-byte is also the reason why . strings are garbage
at Wasm GC today.
Stringref a good suggestion and the Wasm community team made a mistake by rejecting it.

However, there is a discussion about both
multi-byte and
ArrayBuffer access to GitHub, but as far as I can tell the issue is nowhere near a resolution.

Can these things be implemented effectively? How can the need for direct access to packed arrays from JS be reconciled with Wasm heap object opaqueness? I hope that the Wasm community team will reach solutions soon because it will take a long time to get the proposal(s) to phase 4 and send to all browsers, maybe years. I got by making simple things with HTML5 Canvas but it’s a shame it’s effectively shut out from using WebGPU when it finally reaches the stable browser release.

2025-01-18 22:36:00

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button