Code Yarns ‍👨‍💻
Tech BlogPersonal Blog

A dissection of C++ virtual functions

📅 2017-Jun-13 ⬩ ✍️ Ashwin Nanjappa ⬩ 🏷️ cpp, objdump, readelf, virtual ⬩ 📚 Archive

Virtual functions are a key feature of C++ to enable runtime polymorphism. This post is my attempt in understanding how they are implemented and executed at runtime. The compiler used is GCC 5.4.0 on Ubuntu 16.04.

Here is a simple program that uses virtual functions that we will use as an example:

#include <iostream>

struct A
{
    virtual void do_something() {}
    virtual void do_something2() { std::cerr << "In A\n"; }
};

struct B : public A
{
    void do_something() {}
    void do_something2() { std::cerr << "In B\n"; }
};

int main()
{
    A* a = new B();
    a->do_something2();
    
    return 0;
}

// When executed, this program will print:
// $ ./a.out
// In B

To aid us in understanding what this code is compiled into, we request GCC to add debugging information (using option -g) when we compile it:

$ g++ -g virtual_function_example.cpp
$ ./a.out
In B

Almost all C++ compilers implement virtual functions by using virtual tables, more commonly called as vtables. This is a table of function addresses, one for each virtual function in the class. One virtual table is created for each class that has virtual functions.

We can see the existence of the methods and virtual tables of each class and their addresses by examining the binary:

$ readelf --symbols a.out | c++filt | grep -E "vtable|A::|B::"

    86: 0000000000400936    11 FUNC    WEAK   DEFAULT   14 A::do_something()
    81: 0000000000400942    30 FUNC    WEAK   DEFAULT   14 A::do_something2()
    87: 0000000000400960    11 FUNC    WEAK   DEFAULT   14 B::do_something()
    84: 000000000040096c    30 FUNC    WEAK   DEFAULT   14 B::do_something2()
    60: 000000000040098a    23 FUNC    WEAK   DEFAULT   14 A::A()
    69: 00000000004009a2    39 FUNC    WEAK   DEFAULT   14 B::B()
    92: 0000000000400a68    32 OBJECT  WEAK   DEFAULT   16 vtable for B
    63: 0000000000400a88    32 OBJECT  WEAK   DEFAULT   16 vtable for A

Here we use the readelf program to extract the symbols from the binary. The symbols are in mangled form that is difficult to decipher for humans. So, we pipe it through a demangler.

Here is the output I got on my computer:

 

We can check which sections of virtual memory the class methods and virtual tables will be loaded into by examining the sections of the binary:

$ readelf --sections a.out
There are 37 section headers, starting at offset 0x6b78:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [..]
  [14] .text             PROGBITS        00000000004007a0 0007a0 0002a2 00  AX  0   0 16
  [..]
  [16] .rodata           PROGBITS        0000000000400a50 000a50 00008b 00   A  0   0  8
  [..]

Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

We can cross-examine the addresses of the class methods and virtual tables with the starting addresses and sizes of the sections. We see that the class methods will be loaded into the .text section and the virtual tables into the .rodata segment. The flags of these sections indicate that only the .text section is executable, as it should be.

 

Finally, let us examine how the virtual tables are used at runtime to determine which method to execute. To do this, we disassemble the binary instructions in the binary:

$ objdump --disassemble --demangle --source a.out

int main()
{
  400896:       55                      push   %rbp
  400897:       48 89 e5                mov    %rsp,%rbp
  40089a:       53                      push   %rbx
  40089b:       48 83 ec 18             sub    $0x18,%rsp
    A* a = new B();
  40089f:       bf 08 00 00 00          mov    $0x8,%edi
  4008a4:       e8 d7 fe ff ff          callq  400780 <operator new(unsigned long)@plt>
  4008a9:       48 89 c3                mov    %rax,%rbx
  4008ac:       48 c7 03 00 00 00 00    movq   $0x0,(%rbx)
  4008b3:       48 89 df                mov    %rbx,%rdi
  4008b6:       e8 e7 00 00 00          callq  4009a2 <B::B()>
  4008bb:       48 89 5d e8             mov    %rbx,-0x18(%rbp)
    a->do_something2();
  4008bf:       48 8b 45 e8             mov    -0x18(%rbp),%rax
  4008c3:       48 8b 00                mov    (%rax),%rax
  4008c6:       48 83 c0 08             add    $0x8,%rax
  4008ca:       48 8b 00                mov    (%rax),%rax
  4008cd:       48 8b 55 e8             mov    -0x18(%rbp),%rdx
  4008d1:       48 89 d7                mov    %rdx,%rdi
  4008d4:       ff d0                   callq  *%rax

  4008d6:       b8 00 00 00 00          mov    $0x0,%eax
    return 0;
  4008db:       48 83 c4 18             add    $0x18,%rsp
  4008df:       5b                      pop    %rbx
  4008e0:       5d                      pop    %rbp
  4008e1:       c3                      retq

From the output of objdump, only the disassembly of the main function is shown above. In the above command, we have requested objdump to --disassemble the binary code to assembly code, to --demangle the symbol names to human readable form and to annotate the disassembly with the original C++ --source statements.

By examining the disassembled code, the runtime mystery is revealed. We need to note that every object of a class, that has virtual methods, stores a pointer to its class virtual table. On a 64-bit computer, this means that objects of such classes need extra space of 8 bytes. This pointer is placed at the beginning of the memory layout of the object, even before other members of the object.

When you call a virtual method in C++ code, the compiler generates these instructions:

Here is an illustration of the code disassembly:

 

© 2022 Ashwin Nanjappa • All writing under CC BY-SA license • 🐘 @codeyarns@hachyderm.io📧