C++: Template Separation Model

C++ function templates are a great way to reduce the amount of repeated code written to do the same operations on objects of different types. For example, a minimum function that handles any type could be written as:

template <typename T>
T min(T a, T b)
    return (a > b) ? b : a;

There are 2 ways to organize the template code: inclusion model and separation model. In the inclusion model, the complete function definition is included in the header file. In the separation model, the function declaration is in the header file and the function definition is put in the source file. However, it won’t work as expected. The keyword export needs to be used with the function definition (i.e., export template <typename T> …)

Though this is Standard C++, this won’t work! Welcome to the real world of C++. No major compiler out there supports this clean separation model (export keyword). Even the latest VC++ 8 doesnot support it. Nor does gcc.



C++: Method Chaining

I recently had to do this to an object:

Fbo f;

In this case, calling so many functions was inevitable since I was setting this particular object to non-default values. That is okay, but I would like to do the above like this:

Fbo f;

This is called method chaining and is a neat C++ trick.

All that is needed for this to work is that every function that is involved in the chaining (except the rightmost one) needs to return *this at the end of the call. For example:

class Fbo
    Fbo& setType(GLenum type)
        _type = type;
        return *this;

TortoiseSVN: A Quick Start Guide

I have been an avid user of TortoiseCVS for many years now. It has been the single most important influence for me to like version control systems. Every version control tool I had used before it had made me cringe and swear. Last week, I switched to using TortoiseSVN for my source code files. Like TortoiseCVS is built upon CVS, TortoiseSVN has been built upon a much more advanced version control system called Subversion.

Why did I switch? There are certain operations which are difficult or impossible in CVS/TortoiseCVS. If you rename, add and delete files inside your project, then CVS struggles to maintain versions of those. Since I have used some very advanced version control systems like Clearcase before, I knew CVS had reached its limits of use for me. I had to move on to Subversion.

When I did move, I found that there are not many guides out on the Internet which provide the minimum number of steps required for TortoiseCVS users to switch to its new brother TortoiseSVN. I aim to do that below with some simple steps and screenshots.


Download TortoiseSVN and install it.


First, we need a repository to hold the versions of our data. Say you want to create a local repository (one that exists on your harddisk). Create an empty directory to hold the SVN repository. Right-click inside it and choose TortoiseSVN → Create Repository here. Choose Native Filesystem (FSFS) in the next dialog. The repository is created!

A Simple Project

I will assume you have a bunch of files or folders all nicely managed inside a single folder which you want to manage as a project.

Lets lay the groundwork for your project. Open Repo-browser by right-click and choosing TortoiseSVN → Repo-browser. Provide the path to your repository in the URL section.

Create a new folder for your project. Give the name of your project to this folder. Now, here’s the bit where SVN differs from CVS. Create 3 folders inside the above folder named trunk, tags and branches. trunk will hold your data most of the time. tags will hold your tagged data and branches will hold your branched data.

Importing Your Project

Clean the folder you wish to import into the repository by removing all unnecessary files and folders inside. Right-click on the folder now and choose TortoiseSVN → Import. Choose the location of your repository. Choose your project name and choose trunk. We are importing our files into the main trunk of the repository tree. The tags we create and branches we create in the future will reside on the tags and branches respectively but will derive from the trunk.

Working Copy

Now that all our project data has been imported into the repository, we can check out a working copy to any place we want and start working! Create a new folder with any desired name. Right-click on this folder and choose TortoiseSVN → Checkout. In the repository section choose your project’s trunk folder and checkout.

You can modify and commit the files inside this working copy folder just like you did in TortoiseCVS.


Tagging is a bit different in SVN. Right-click and choose TortoiseSVN → Branch/Tag. In the To URL section, choose your project’s tags folder and append a tag to it there. Use the same kind of …/tags/… path when you wish to checkout this tagged data. This is how tagging is different in SVN.

To merge back a branch to trunk, go to the trunk checkout and merge the branch “to” it.

I hope that with the preliminary information given above, you can get started with TortoiseSVN.

Texture Compression (S3TC and VTC)

One of the most popular texture compression algorithms used in OpenGL are the DXTn series which were introduced by S3 Graphics. Hence, they’re known as S3TC. The working of this algorithm can be found in the Appendix of GL_EXT_texture_compression_s3tc. There are 5 versions available ranging from DXT1 to DXT5. DXT1 is briefly explained below:


A 4×4 texel block (48 bytes if texel is RGB) is compressed into 2 16-bit color values (c0 and c1) and a 4×4 2-bit lookup block.

c2 and c3 are calculated from c0 and c1 as follows:

If c0 <= c1,

c2 = (c0 + c1) / 2;

c3 = not defined;


c2 = (2 * c0 + c1) / 3;

c3 = (c0 + 2 * c1) / 3;


Decompression is extremely fast. It is just a lookup of 2-4 precomputed values.

Read the 2-bit value of each compressed pixel. If 00 then read RGB of c0, if 01 then read RGB of c1 and so on.


VTC (GL_NV_texture_compression_vtc) is also based on the above ideas, just extend the texel blocks in the z direction.

C++: POD

When programming in C++, mixing up C++ and C data types becomes an ugly inevitability. It always throws up some quirky behaviour. POD (Plain Old Data) is one of these I discovered today. C macros can be used unchanged under C++. But, the correct behaviour under C++ depends on the type of data being operated on. It needs to be of POD type.

Here is some information about the POD type from the excellent C++ FAQ Lite:

[26.7] What is a “POD type”?

A type that consists of nothing but Plain Old Data.

A POD type is a C++ type that has an equivalent in C, and that uses the same rules as C uses for initialization, copying, layout, and addressing.

As an example, the C declaration struct Fred x; does not initialize the members of the Fred variable x. To make this same behaviour happen in C++, Fred would need to not have any constructors. Similarly to make the C++ version of copying the same as the C version, the C++ Fred must not have overloaded the assignment operator. To make sure the other rules match, the C++ version must not have virtual functions, base classes, non-static members that are private or protected, or a destructor. It can, however, have static data members, static member functions, and non-static non-virtual member functions.

The actual definition of a POD type is recursive and gets a little gnarly. Here’s a slightly simplified definition of POD: a POD type’s non-static data members must be public and can be of any of these types: bool, any numeric type including the various char variants, any enumeration type, any data-pointer type (that is, any type convertible to void*), any pointer-to-function type, or any POD type, including arrays of any of these. Note: data-pointers and pointers-to-function are okay, but pointers-to-member are not. Also note that references are not allowed. In addition, a POD type can’t have constructors, virtual functions, base classes, or an overloaded assignment operator.

Visual C++: Stack Overflow

On Visual C++ 2005, I allocated a large local array in a function. The program got a stack overflow exception and ended inside chkstk.asm.

I’m used to the stack size limit on Linux/Cygwin which is usually 2MB. The limit can be found using the bash builtin command ulimit.

$ ulimit -s
2042 (KB)

But, the array I was allocating under VC++ 2005 was just a bit larger than 1MB. On further digging, I found that the default stack size on VC++ 2005 is 1MB.

This stack size limit can be modified using:
Project → Properties → Configuration Properties → Linker → System → Stack Reserve Size.

More information on the stack size limit can be found from the MSDN page on /STACK linker option.

Visual C++: Library Pragma

I find myself having to indicate the libraries I want linked in every time I do this. I found a neat (non-portable) trick in Visual Studio to do this.

Use the #pragma comment(lib, "libfile") [1] preprocessor directive to hint your compiler/linker to include these library files for linking. For example:

// Link cg libraries
#pragma comment(lib, "cg.lib")
#pragma comment(lib, "cggl.lib")

[1] msdn.microsoft.com/library/en-us/vclang/html/_predir_comment.asp

(via Adding MSDEV Libraries)


A colleague informed me today that my name had appeared in the April 2005 issue of the Embedded Systems Programming magazine. Back in December 2004, I had commented to Dan Saks about his article More ways to map memory on the usage of the available C fixed width integer types. We had an email discussion on it and I forgot all about it. I had blogged earlier about these types.

In his latest article Sizing and aligning device registers he mentions that email conversation. I know this is not anything significant, but this is the first time my name has appeared in a deadwood tech magazine! 🙂

Ashwin N (ashwin.n@gmail.com) suggested yet another way to define the special_register type:

If you want to use an unsigned four-byte word, shouldn’t you be doing:

/* ... */
typedef uint32_t volatile special_register;

This should work with all modern standard C compilers/libraries.

The typedef uint32_t is an alias for some unsigned integer type that occupies exactly 32 bits. It’s one of many possible exact-width unsigned integer types with names of the form uintN_t, where N is a decimal integer representing the number of bits the type occupies. Other common exact-width unsigned types are uint8_t and uint16_t. For each type uintN_t, there’s a corresponding type intN_t for a signed integer that occupies exactly N bits and has two’s complement representation.

I have been reluctant to use <stdint.h>. It’s available in C99, but not in earlier C dialects nor in Standard C++. However, it’s becoming increasingly available in C++ compilers, and likely to make it into the C++ Standard someday. Moreover, as Michael Barr observed, if the header isn’t available with your compiler, you can implement it yourself without much fuss. I plan to start using these types more in my work.

Again, using a typedef such as special_register makes the exact choice of the integer type much less important. However, I’m starting to think that uint32_t is the best type to use in defining the special_register type.

C: Shift Operator Mayhem

Came across a puzzling piece of code today. The actual code is confusing, however it basically boils down to this:

#include <stdint.h>

int main()
    uint32_t val   = 1;
    uint32_t count = 32;
    val            = val >> count;

    return 0;

What do you think will be the result in val? Me thought 0. Turned out to be 1.

After further investigation, I found that this was due to a combination of an undefined behaviour in C, vague behaviour of certain IA-32 architecture operations and my ignorance of both.

On examining the code above, it is natural to think that 32 right shifts applied on val would boot out the puny 1 and the result would be 0. Though this is right almost always, it has some exceptions.

From The C Programming Language [1]:

The result is undefined if the right operand is negative, or greater than or equal to the number of bits in the left expression’s type.

(Taking val >> count as example, left expression is val and right operand is count.)

So, that explains why the result should not be relied on. But why val is 1? On digging deeper for that, I found that the compiler [2] generated the Intel instruction sar or shr (or it’s variants) for the C shift operation. And here lies another nasty info …

From the IA-32 Intel Architecture Software Developer’s Manual [3]:

The 8086 does not mask the shift count. However, all other IA-32 processors (starting with the Intel 286 processor) do mask the shift count to 5 bits, resulting in a maximum count of 31. This masking is done in all operating modes (including the virtual-8086 mode) to reduce the maximum execution time of the instructions.

So, not only is the behaviour in C undefined, on code generated for IA-32 processors, a 5 bit mask is applied on the shift count. This means that on IA-32 processors, the range of a shift count will be 0-31 only.

[1] A7.8 Shift Operators, Appendix A. Reference Manual, The C Programming Language
[2] Observed with both Visual C++ and GCC compilers
[3] SAL/SAR/SHL/SHR – Shift, Chapter 4. Instruction Set Reference, IA-32 Intel Architecture Software Developer’s Manual