Texture Compression (S3TC and VTC)

One of the most popular texture compression algorithms used in OpenGL are the DXTn series which were introduced by S3 Graphics. Hence, they’re known as S3TC. The working of this algorithm can be found in the Appendix of GL_EXT_texture_compression_s3tc. There are 5 versions available ranging from DXT1 to DXT5. DXT1 is briefly explained below:


A 4×4 texel block (48 bytes if texel is RGB) is compressed into 2 16-bit color values (c0 and c1) and a 4×4 2-bit lookup block.

c2 and c3 are calculated from c0 and c1 as follows:

If c0 <= c1,

c2 = (c0 + c1) / 2;

c3 = not defined;


c2 = (2 * c0 + c1) / 3;

c3 = (c0 + 2 * c1) / 3;


Decompression is extremely fast. It is just a lookup of 2-4 precomputed values.

Read the 2-bit value of each compressed pixel. If 00 then read RGB of c0, if 01 then read RGB of c1 and so on.


VTC (GL_NV_texture_compression_vtc) is also based on the above ideas, just extend the texel blocks in the z direction.


C++: POD

When programming in C++, mixing up C++ and C data types becomes an ugly inevitability. It always throws up some quirky behaviour. POD (Plain Old Data) is one of these I discovered today. C macros can be used unchanged under C++. But, the correct behaviour under C++ depends on the type of data being operated on. It needs to be of POD type.

Here is some information about the POD type from the excellent C++ FAQ Lite:

[26.7] What is a “POD type”?

A type that consists of nothing but Plain Old Data.

A POD type is a C++ type that has an equivalent in C, and that uses the same rules as C uses for initialization, copying, layout, and addressing.

As an example, the C declaration struct Fred x; does not initialize the members of the Fred variable x. To make this same behaviour happen in C++, Fred would need to not have any constructors. Similarly to make the C++ version of copying the same as the C version, the C++ Fred must not have overloaded the assignment operator. To make sure the other rules match, the C++ version must not have virtual functions, base classes, non-static members that are private or protected, or a destructor. It can, however, have static data members, static member functions, and non-static non-virtual member functions.

The actual definition of a POD type is recursive and gets a little gnarly. Here’s a slightly simplified definition of POD: a POD type’s non-static data members must be public and can be of any of these types: bool, any numeric type including the various char variants, any enumeration type, any data-pointer type (that is, any type convertible to void*), any pointer-to-function type, or any POD type, including arrays of any of these. Note: data-pointers and pointers-to-function are okay, but pointers-to-member are not. Also note that references are not allowed. In addition, a POD type can’t have constructors, virtual functions, base classes, or an overloaded assignment operator.

Visual C++: Stack Overflow

On Visual C++ 2005, I allocated a large local array in a function. The program got a stack overflow exception and ended inside chkstk.asm.

I’m used to the stack size limit on Linux/Cygwin which is usually 2MB. The limit can be found using the bash builtin command ulimit.

$ ulimit -s
2042 (KB)

But, the array I was allocating under VC++ 2005 was just a bit larger than 1MB. On further digging, I found that the default stack size on VC++ 2005 is 1MB.

This stack size limit can be modified using:
Project → Properties → Configuration Properties → Linker → System → Stack Reserve Size.

More information on the stack size limit can be found from the MSDN page on /STACK linker option.

Visual C++: Library Pragma

I find myself having to indicate the libraries I want linked in every time I do this. I found a neat (non-portable) trick in Visual Studio to do this.

Use the #pragma comment(lib, "libfile") [1] preprocessor directive to hint your compiler/linker to include these library files for linking. For example:

// Link cg libraries
#pragma comment(lib, "cg.lib")
#pragma comment(lib, "cggl.lib")

[1] msdn.microsoft.com/library/en-us/vclang/html/_predir_comment.asp

(via Adding MSDEV Libraries)


A colleague informed me today that my name had appeared in the April 2005 issue of the Embedded Systems Programming magazine. Back in December 2004, I had commented to Dan Saks about his article More ways to map memory on the usage of the available C fixed width integer types. We had an email discussion on it and I forgot all about it. I had blogged earlier about these types.

In his latest article Sizing and aligning device registers he mentions that email conversation. I know this is not anything significant, but this is the first time my name has appeared in a deadwood tech magazine! 🙂

Ashwin N (ashwin.n@gmail.com) suggested yet another way to define the special_register type:

If you want to use an unsigned four-byte word, shouldn’t you be doing:

/* ... */
typedef uint32_t volatile special_register;

This should work with all modern standard C compilers/libraries.

The typedef uint32_t is an alias for some unsigned integer type that occupies exactly 32 bits. It’s one of many possible exact-width unsigned integer types with names of the form uintN_t, where N is a decimal integer representing the number of bits the type occupies. Other common exact-width unsigned types are uint8_t and uint16_t. For each type uintN_t, there’s a corresponding type intN_t for a signed integer that occupies exactly N bits and has two’s complement representation.

I have been reluctant to use <stdint.h>. It’s available in C99, but not in earlier C dialects nor in Standard C++. However, it’s becoming increasingly available in C++ compilers, and likely to make it into the C++ Standard someday. Moreover, as Michael Barr observed, if the header isn’t available with your compiler, you can implement it yourself without much fuss. I plan to start using these types more in my work.

Again, using a typedef such as special_register makes the exact choice of the integer type much less important. However, I’m starting to think that uint32_t is the best type to use in defining the special_register type.

C: Shift Operator Mayhem

Came across a puzzling piece of code today. The actual code is confusing, however it basically boils down to this:

#include <stdint.h>

int main()
    uint32_t val   = 1;
    uint32_t count = 32;
    val            = val >> count;

    return 0;

What do you think will be the result in val? Me thought 0. Turned out to be 1.

After further investigation, I found that this was due to a combination of an undefined behaviour in C, vague behaviour of certain IA-32 architecture operations and my ignorance of both.

On examining the code above, it is natural to think that 32 right shifts applied on val would boot out the puny 1 and the result would be 0. Though this is right almost always, it has some exceptions.

From The C Programming Language [1]:

The result is undefined if the right operand is negative, or greater than or equal to the number of bits in the left expression’s type.

(Taking val >> count as example, left expression is val and right operand is count.)

So, that explains why the result should not be relied on. But why val is 1? On digging deeper for that, I found that the compiler [2] generated the Intel instruction sar or shr (or it’s variants) for the C shift operation. And here lies another nasty info …

From the IA-32 Intel Architecture Software Developer’s Manual [3]:

The 8086 does not mask the shift count. However, all other IA-32 processors (starting with the Intel 286 processor) do mask the shift count to 5 bits, resulting in a maximum count of 31. This masking is done in all operating modes (including the virtual-8086 mode) to reduce the maximum execution time of the instructions.

So, not only is the behaviour in C undefined, on code generated for IA-32 processors, a 5 bit mask is applied on the shift count. This means that on IA-32 processors, the range of a shift count will be 0-31 only.

[1] A7.8 Shift Operators, Appendix A. Reference Manual, The C Programming Language
[2] Observed with both Visual C++ and GCC compilers
[3] SAL/SAR/SHL/SHR – Shift, Chapter 4. Instruction Set Reference, IA-32 Intel Architecture Software Developer’s Manual