Aligned memory allocation

In some scenarios, you want to get memory that is aligned at an address that is a certain power of 2. Certain CPU architectures and certain operations require (or are faster) if their operands are located at an address that is a multiple of a certain power-of-2 number. For these reasons, you might see that many multi-platform libraries use an aligned memory allocator instead of malloc in their code. For example, OpenCV uses methods named fastMalloc and fastFree inside its code that do this type of allocation and freeing.

Most of these methods work like this:

  • They internally get memory from malloc. However, if you requested for N bytes, the wrapper will request for N+P+A bytes from malloc. Here, P is the size of a pointer on that CPU architecture and A is the alignment required, expressed in power-of-2 number of bytes. For example, if I request for 100 bytes on a 64-bit CPU and require the memory to be aligned to a multiple of 32, then the wrapper will request for 140 bytes.

  • After getting the memory from malloc, it aligns the pointer forward so that (1) the pointer is at an address that is aligned as per requirement and (2) there is space behind the pointer to store a memory address.

  • Then we sneak and store the address actually returned by malloc behind the pointer address and return the pointer to the user.

  • The user has to use our free wrapper to free this pointer. When she does that we sneak back to reveal the actual address returned by malloc and free using that.

Here is some example code that illustrates aligned memory allocation:


Free command in Linux

A common question that occurs to any user of an operating system is how much memory is being used and how much is free. The command to check this in Linux is free.

You run it and it throws up some head-scratching output:

$ free
             total       used       free     shared    buffers     cached
Mem:       5971016    4376120    1594896     210616     237260    2398084
-/+ buffers/cache:    1740776    4230240
Swap:      7885820        304    7885516

Right off the bat you can see that it is showing values in bytes. While this might have been fine back when Unix was invented, it is utterly unreadable with the GBs of RAM we have in today’s computers.

We first fix that by asking it to show human readable output:

$ free --human
             total       used       free     shared    buffers     cached
Mem:          5.7G       4.2G       1.5G       206M       231M       2.3G
-/+ buffers/cache:       1.7G       4.0G
Swap:         7.5G       304K       7.5G

Now we can read the value in GBs and MBs easily.

Some notes about interpreting the output:

  • The first two lines of numbers are concerned about RAM. The final line of numbers is about your swap space.
  • The first three columns seem straightforward: the total capacity, how much of the total is used by processes and how much of the total is free.
  • The next three columns are a bit more complicated. These are the memory shared among processes, memory that is being used as buffers (temporary storage) by the kernel and as cached for pages.
  • The used and free entries in the first line show you how much RAM is being used and is free. You should not get worried if you see the free number being low. Memory lying unused is useless, so kernel tries to use it as buffers and for caching. How much of the used memory has been put up to use as buffers and cache is also shown in the first line.
  • Concerned about how much memory is truly being used by processes you are running? That is why the confusing second line exists! used-in-second-line = used-in-first-line - buffers - cached and free-in-second-line = free-in-first-line + buffers + cached. Take a moment on these values. These calculations make sense since if your processes ask for more memory, the kernel will happily free its buffers and cached resources and hand it over!
  • Finally, the shared is not factored into the second line computation because it is memory that is already shared among the processes. It is already a part of the used memory. Now if you start asking how much memory is being used by a single process then the computation factoring in shared memory gets harder.
  • Please ignore the -/+ buffers/cache text. It is completely confusing to the average user. To make any sense it should have been -/+ buffers+cached.

Tried with: Ubuntu 15.10