"Be not overcome with evil, but overcome evil with good."
- Romans 12:21

Framework 4 (Last updated: October 25, 2019)
Framework 3 (Last updated: February 6, 2017)
Framework 2 (Last updated: October 8, 2006)
Framework (Last updated: October 8, 2006)
Libraries (Last updated: September 16, 2004)
Really old framework (Last updated: September 16, 2004)
GPU Texture Compression
Saturday, April 12, 2008 | Permalink

Executable
Source code
GPUTextureCompression.zip (335 KB)

Required:
Direct3D 10.1
Compressed textures is one of the most important features and have saved us a lot of memory and bandwidth over the years. However, its use has long been practically limited to static textures only because the compression is slow and there's no compression hardware, just decompression. Even with a simple compression algorithm the bottleneck of transferring GPU data to the CPU for compression and then transferring compressed data back makes it impractical for most uses to compress render targets.

Direct3D 10.1 introduced the ability to do a copy to block compressed formats from integer formats of the same size as a compressed block. For instance, a 256x256 RG32_UINT texture can be copied into a 1024x1024 BC4 texture. This allows an application to write a compression shader rendering to an integer render target and then copying the results into a compressed texture and thus keeping the entire texture compression process on the GPU.

This demo compresses a texture (although static) on the GPU to a BC4 format. Because this is a single channel texture I also use the Gather texture fetch method (AKA Fetch4) that was also introduced in Direct3D 10.1 to reduce the number of texture fetches from 16 to 4. The shader is surprisingly short at 49 ALU instructions on a HD 3870, which for a 4x4 block is barely over 3 instructions per pixel.

For render targets that stay the same across many frames it's likely that compressing it will improve performance. Even compressing every frame is definitively practical, although whether you get a performance increase or not depends on how many times you read it for each compression cycle. Another interesting use is to avoid the harddrive bottleneck when reading textures from disk, which you can do by storing textures as jpegs on disk and recompressing to a GPU format on the fly. Having it on the GPU ensures the compression process will be fast, although an implementation would be likely to do the jpeg decompression on the CPU.

This demo should run on the Radeon HD 3000 series. Since this demo uses Direct3D 10.1 you need to have Windows Vista SP1 installed.