Code Yarns ‍👨‍💻
Tech BlogPersonal Blog

Aligned memory allocation

📅 2017-Feb-28 ⬩ ✍️ Ashwin Nanjappa ⬩ 📚 Archive

In some scenarios, we want the system to allocate memory for us that is aligned at an address that is a certain power of 2. Certain CPU architectures and certain operations require (or are faster) if their operands are located at an address that is a multiple of a certain power-of-2 number. For these reasons, you might see that many multi-platform libraries use an aligned memory allocator instead of malloc in their code. For example, OpenCV uses methods named fastMalloc and fastFree inside its code that do this type of allocation and freeing.

Example of aligned memory allocation
Example of aligned memory allocation

Many of these methods work as follows:

  1. User requests for N bytes of memory.
  2. We ask malloc for N + P + (A - 1) bytes. Here, P is the size of a pointer of that system and A is the alignment required. A is a power-of-2 number. For example, if I request for 100 bytes on a 32-bit CPU and require the memory to be aligned to a multiple of 8, then the wrapper will request for 111 bytes.
  3. After getting the memory from malloc, we align the pointer forward so that (1) the pointer is at an address that is aligned as per requirement and (2) there is space behind the pointer to store a memory address. We return this pointer to the user.
  4. We reach back P bytes behind the address we gave the user and store the address we got from malloc there.
  5. Once the user is done they call our version of free with the pointer we had given them.
  6. We reach behind this pointer by P bytes to get the address we got from malloc.
  7. We call free on that pointer to cleanly free all the memory.

Here is some example code that illustrates aligned memory allocation:

// Assume we need 32-byte alignment for AVX instructions
#define ALIGN 32

void *aligned_malloc(int size)
{
    // We require whatever user asked for PLUS space for a pointer
    // PLUS space to align pointer as per alignment requirement
    void *mem = malloc(size + sizeof(void*) + (ALIGN - 1));
    
    // Location that we will return to user
    // This has space *behind* it for a pointer and is aligned
    // as per requirement
    void *ptr = (void**)((uintptr_t) (mem + (ALIGN - 1) + sizeof(void*)) & ~(ALIGN - 1));
    
    // Sneakily store address returned by malloc *behind* user pointer
    // void** cast is cause void* pointer cannot be decremented, cause
    // compiler has no idea "how many" bytes to decrement by
    ((void **) ptr)[-1] = mem;
    
    // Return user pointer
    return ptr;
}

void aligned_free(void *ptr)
{
    // Sneak *behind* user pointer to find address returned by malloc
    // Use that address to free
    free(((void**) ptr)[-1]);
}