Fix data packing on Intel HD 4000 cards with this one weird trick!

This blog's contents are neither weird nor tricky, but we couldn't resist the chance to poke fun at click bait before it gets old (too late?). We pack a lot of data into render targets in Floored's rendering pipeline; it's the only way we can fit all of the data for our G-Buffer into a single RGBA unsigned byte target. Since WebGL's shader model does not support bitwise operations, we have to do all this packing using floating point multiplication and division to mimic them. In one of our simplified pipelines, we use such packing to store a floating point depth value in the range [0.0, 1.0] as an integral in the range [0, 2^24 - 1] encoded in three unsigned bytes. We also use similar encoding to store the fame normals in a single unsigned byte RGB target.

Wall drawings

We recently noticed rendering artifacts in this pipeline on a specific machines with Intel HD Graphics 4000 cards (Figure 1). The shaders for this view use the world-space normals to determine wall color and the depth to create outlines. In the image below you can see artifacts from errors in both of these: bands of incorrect color from errors in the normals and outlines on flat walls caused by incorrect depth discontinuities.

Figure 1. The left half shows rendering artifacts caused by incorrect depth and normal values, while the right half shows the correct appearance.

Unit tests

We ran a unit test for our encoding and decoding functions on that machine and found that some of the decoded values were far off from the correct value. The output of our unit test, seen in Figure 2, showed that there was a distinct pattern in the values that failed to round trip correctly in the encoding/decoding process. This unit test encodes every 24-bit integral value to a texture with one shader and then decodes them and checks whether they result in the correct value. Since you would need a 4K canvas to view them all at once we do it in smaller chunks and scrub through the chunks with a shader uniform.

Figure 2. Failed unit test for encoding/decoding a float value in an RGB unsigned byte texture.]

Next we tried testing only the decoding portion of this unit test. Decoding is made up of two steps. The first converts the texture sample values to integral floats in the range [0.0, 255.0]:

vec3 sample_to_unit8_8_8(const in vec3 sample) {
  return floor(sample * 255.0);
}

The second step takes these three floats masquerading as unsigned bytes and unpacks them into a single float pretending to be a 24-bit integer:

float uint8_8_8_to_uint24(const in vec3 raw) {
  const vec3 BIT_SHIFT = vec3(256.0 * 256.0, 256.0, 1.0);
  return raw.x * BIT_SHIFT.x + (raw.y * BIT_SHIFT.y + raw.z);
}

To test the first part by itself we created an unsigned byte RGB texture with texel values of [0,0,0], [1,1,1], ... , [255,255,255]. When running this texture's samples through <code>sample_to_uint_8_8_8() we found that we failed to reconstruct the proper 8-bit integer value for every power of two (Figure 3). In fact these values were all one below the correct value. It seems that there is probably more error in the conversion from unsigned byte to floating point on this particular card and, since floating point multiplication by power of two is exact, the resulting scaled value is just under the "correct" integer-valued result. Changing the conversion to round instead of just floor, which we should have been doing all along, is the correct way to perform this conversion and fixes this particular:

vec3 sample_to_uint8_8_8(const in vec3 sample) {
  return floor(sample * 255.0 + 0.5);
}

Figure 3. Test for decoding an unsigned byte to it]

Try it at home

Below we've included the WebGL unit test for checking whether your GPU/driver exhibits the same decoding issue we saw on the HD 4000. If everything is green then every byte value has been successfully converted to its corresponding integer-valued float. Vertical, red lines indicate byte values that failed to be converted to the correct float. If there are any vertical lines that don't extend all the way from top to bottom it means that the conversion only failed for a subset of the color channels (red: bottom, green: middle, blue: top). Drop us a line and let us know if you find any other hardware that shows similar behavior to what we saw with the HD 4000.

Packing Decode Bug Demo