Gotta RE 'em All: Reversing C++ Virtual Function Tables with Binary Ninja

C++ can be frustrating to reverse engineer. Explore how to reverse engineer those with Binary Ninja.

Gotta RE 'em All: Reversing C++ Virtual Function Tables with Binary Ninja
Photo by Akin Cakiner / Unsplash

*OS reverse engineers have it rough. In addition to understanding the C, Swift, and Objective-C ABIs and runtimes, they also must understand C++. Coming from a Linux background, where C++ is considered a "...horrible language", this is an area that I find myself lacking. In that regard, I hope that this post is as helpful to me as it is the reader.

We're specifically going to look at virtual methods. The compiler is informed of virtual methods with the virtual specifier.

struct Pokemon {
    ...
    virtual void cry(void) = 0;
    virtual void attack(void) = 0;
    virtual void flee(void) = 0;
}

virtual signals to the compiler that the method supports dynamic dispatch (which is different than late-binding, which I just learned). This is just a fancy way of saying that the bindings are known at runtime (dynamic) and not at compile time (static). And why aren't they known? Because we don't which derived class overrides the base implementation.

Consider the following. We create two structures which derive from Pokemon. Each implements (overrides) the base implementation of each method.

#include <cstdlib>
#include <iostream>

struct Pokemon {
    Pokemon() { std::cout << "Pokemon::Pokemon\n"; };
    virtual ~Pokemon() { std::cout << "Pokemon::~Pokemon\n"; };
    /* These methods must be 'pure' virtual and set to 0.
     * This informs the compiler that the method *must* be
     * overwritten by a child class. Otherwise, the compiler
     * assumes that there is an implementation somewhere.
     * Taken litterally, we're defining a NULL implementation.
     */
    virtual void cry(void) = 0;
    virtual void attack(void) = 0;
    virtual void flee(void) = 0;
};

struct Caterpie : Pokemon {
    Caterpie() { std::cout << "Caterpie::Caterpie\n"; };
    virtual ~Caterpie() { std::cout << "Caterpie::~Caterpie\n"; };
    virtual void cry(void) override { std::cout << "Caterpie says 'Caterpie!'\n"; };
    virtual void attack(void) override { std:: cout << "Caterpie used string shot!\n"; };
    virtual void flee(void) override { std::cout << "Caterpie fled!\n"; };
};

struct Weedle : Pokemon {
    Weedle() { std::cout << "Weedle::Weedle\n"; };
    virtual ~Weedle() { std::cout << "Weedle::~Weedle\n"; };
    virtual void cry(void) override { std::cout << "Weedle says 'Weedle!'\n"; };
    virtual void attack(void) override { std:: cout << "Weedle used poison sting!\n"; };
    virtual void flee(void) override { std::cout << "Weedle fled!\n"; };
};

Now what would happen if we declared a Pokemon at compile-time, but assigned it to either a Caterpie or Weedle at run-time? The implementation of which could look something like the following.

#include <iostream>
#include "pokemon.h"

int main(int argc, const char * argv[]) {
    
    /* Seed the random number. */
    srand((unsigned int)time(NULL));
    
    /* Create a generic Pokemon. */
    Pokemon * pokemon;
    
    /* Flip a coin to see which version we're playing. */
    if (rand() % 2) {
        std::cout << "You're playing Pokemon Red!\n";
        pokemon = new Weedle();
    } else {
        std::cout << "You're playing Pokemon Yellow!\n";
        pokemon = new Caterpie;
    }
    /* Make the Pokemon do some tricks. */
    pokemon->cry();
    pokemon->attack();
    pokemon->flee();
    
    return EXIT_SUCCESS;
}

If you run this program and flip heads (an odd number, I guess), you'll get the following output:

You're playing Pokemon Red!
Pokemon::Pokemon
Weedle::Weedle
Weedle says 'Weedle!'
Weedle used poison sting!
Weedle fled!

The compiler knows that we have a Pokemon, but it cannot know which derived structure implements it. So how does the program correctly run Weedle's cry, attack, and flee?

Virtual Tables

At compile time, the compiler inserts a virtual pointer – vptr or __vptr – into the base structure. So our Pokemon would look like this:

struct Pokemon {
    /* The vptr is inserted automatically by the compiler. */
    PokemonVirtualTable * vptr;
    ...
    virtual void flee(void) = 0;
};

Pokemon's vptr points to a virtual table. A virtual table is simply a table of pointers to the structure's functions. The virtual table for Pokemon would look like the following:

+----------------------+---------------+
|         Pokemon VirtualTable         |
+----------------------+---------------+
| Offset in VTable     | Function      |
+----------------------+---------------+
| 0x0                  | Destructor #1 |
| 0x8                  | Destructor #2 |
| 0x10                 | cry           |
| 0x18                 | attack        |
| 0x20                 | flee          |
+----------------------+---------------+

Why the two destructors? The compiler (implementation specific) inserts one destructor for dynamically-allocated objects and another for non-dynamically-allocated objects. The rest should look familiar, cry, attack, and flee.

What about the other two classes, Caterpie and Weedle? Theirs will look similar, and in fact, must match the layout of the base structure exactly. I'll copy and paste to make this explicit.

+----------------------+---------------+
|        Caterpie VirtualTable         |
+----------------------+---------------+
| Offset in VTable     | Function      |
+----------------------+---------------+
| 0x0                  | Destructor #1 |
| 0x8                  | Destructor #2 |
| 0x10                 | cry           |
| 0x18                 | attack        |
| 0x20                 | flee          |
+----------------------+---------------+

When the Caterpie struct is constructed, the inherited vptr will point to CaterpieVirtualTable, not PokemonVirtualTable. So now when the program calls pokemon->cry(), it'll find the correct implementation at offset 0x10 in CaterpieVirtualTable.

Reversing the Structures

Now that we've talked the implementation, let's actually observe what's going on. I compiled the program with Xcode and imported it into Binary Ninja 4.2.6455-dev (02c8da1e). Below is the entry point of the Mach-O.

Showing Binary Ninja's HLIL output for the entry point, _start.
HLIL decompilation of _start.

There's a few things to key in on.

  1. The function declares a pointer (x19) at the start of the function.
  2. x19 is assigned to the value returned by operator_new in both conditional blocks.
  3. After the conditional blocks, we see the offsets of x19 (2, 3, 4, 1) being treated as functions calls, where x19 itself is the sole argument.

The first is the easiest to address. Since we made the program, we know the function is declaring a Pokemon. Simple enough.

The second point warrants further discussion. In the first conditional block, for Pokemon Red, the function allocates 8 bytes of memory assigns it to x0_5 variable. Why 8 bytes?

On arm64e (that this program was compiled against), pointers are 8 bytes wide. Since our Weedle struct doesn't have any explicitly added members, it only contains the vptr member. Since that's a pointer, the width of the entire structure is only 8 bytes.

This goes without saying that x0_5 is our Weedle. After assigning the pointer to x19 (our Pokemon), the function then calls sub_100003558(x0_5). Let's take a look.

Binary Ninja HLIL for the Weedle structure constructor function.
The constructor for Weedle.

The function accepts one argument. It then assigns some address to that argument. From looking at the caller, _start, we know that the argument is our Weedle. We also know that Weedle structure only has one member, vptr, and this exists at offset 0x00.

So what's actually happening is that we're setting our weedle->vptr to &data_100004148. What's at that address? Perhaps surprisingly, the virtual table for Pokemon.

We didn't discuss this, but the constructor for a derived class always invokes the base class constructor to rely on a full initialized object. Microsoft has some good documentation on this.

We'll take a look at this vtable in a second, but first notice that this function then reassigns weedle->vptr to &data_1000040e8. This address contains the vtable for Weedle. And that's it. This function is just a Weedle constructor that invokes the base and derived class constructors which just print to stdout.

To clean up this function, we can start by renaming it. We know it constructs a Weedle, so I figured this was a fitting name. You could also name the function Weedle::constructor to make it a little more explicit (note that you can add the :: and other dis-allowed characters in Binary Ninja by wrapping it in backticks).

Then, define a structure by selecting the arg1 parameter and hitting s. Add one member called vptr * by selecting the Weedle type over in the Type window.

Binary Ninja structure showing a Weedle strut.
Weedle struct.

Then, just rename data labels to their corresponding virtual table names.

Cleaned up constructor for Weedle.

Before digging into the virtual function tables, let's do the same thing for the Caterpie constructor seen in the other conditional basic block within _start. You should get something that looks like this.

Now let's go back to _start to admire our work. Binary Ninja propagates the typing backwards. It's come to the conclusion that everything is a Caterpie. While incorrect still, it's far more descriptive than what we started with.

Binary Ninja HLIL of the _start function, slightly more marked up.
HLIL decompilation after marking up the constructors.

Okay, now let's actually go back to the constructWeedle function. Specifically, let's examine what's at the address labeled as PokemonVirtualTable.

Binary Ninja showing the unlabeled Pokemon Virtual Table.
VTable for the Pokemon struct.

This matches exactly with what we predicted in our ascii table above; two destructors and one method for each virtual method (cry, attack, and flee). If we turn this into a structure, we can get _start to show us the exact methods we're calling instead of offsets into x19 (as of now, incorrectly typed as a Caterpie).

Two curious bits, however. First, if you checkout the two destructors (sub_100003bc8 and sub_100003bcc), you'll see they both just call trap(1).

Binary Ninja's HLIL decompilation for a destructor method of the Pokemon struct.
The destructor for Pokemon.

What's up with this? The descriptor for Pokemon is defined; it's not a pure virtual function. See? Here's the definition: virtual ~Pokemon() { std::cout << "Pokemon::~Pokemon\n"; };

This is likely a compiler-ism. The compiler knows that a `Pokemon` is never allocated because it has virtual functions that aren't implemented. So there's some optimization going on, and we'll see our std::cout << "Pokemon::~Pokemon\n" later in the destructors for the derived structs.

Second, the method implementations just point to ___cxa_pure_virtual. This is because we didn't implement them. We defined them as NULL. So there's no function to point to!

This also makes it tricky to define which offset corresponds to which method. So when we make our structure, let's assume that the functions are in the same order that we defined in our header file. If this isn't the case, we can clean it up when we look at the virtual tables for Caterpie and Weedle.

The Binary Ninja structure editor showing the definition of the PokemonVirtualTable.
The PokemonVirtualTable structure.
Binary Ninja output after applying the correct type to the virtual table.
The virtual table after applying the type.

Sweet! It looks a little bit better. But it doesn't do much on its own. Let's jump back to _start and see some magic.

We know that x19 is a Pokemon and not a Caterpie. So let's make a structure and retype it to capture that. Unlike the Caterpie and Weedle structures we made before, which have a vptr typed as a void *, this Pokemon struct will instead have a vptr typed as a struct * PokemonVirtualTable.

The Binary Ninja structure editor defining the Pokemon struct type.
The Pokemon struct.

Now see what happens when we apply that. Suddenly _start makes a whole lot of sense. Note that because of forward type propagation, you may have to retype caterpie and weedle to their derived classes.

Binary Ninja HLIL showing the _start function after we've marked it up fully.
_start after completing our markup.

So we're essentially done with _start; it looks about as clear as it's going to get. For final touches, we could go back to the virtual tables for Weedle and Caterpie, define those, and change the respective vptr members to the new types.

To speed things up, we can have the WeedleVirtualTable and CaterpieVirtualTable derive from the PokemonVirtualTable. To do this, open the structure editor in Binary Ninja by selecting the Type window and hitting S, or right click and select "Create a new structure...".

Name the structure. Then under "Base", type the PokemonVirtualTable and add it at offset 0x00 (not 0x28). Make sure to hit "Add". Ensure "Propagate data variables" is selected.

The Binary Ninja structure editor creating a WeedleVirtualTable with a base of a PokemonVirtualTable.
Creating a structure called the WeedleVirtualTable.

What does this do? If you open the C structure after creating this, you can see that the pointers have been inherited from the PokemonVirtualTable. And in fact, belong to that namespace. Then, retyping the WeedleVirtualTable label to a WeedleVirtualTable (okay, maybe I should have used different names, sorry if this is confusing to the reader), we get all of the members defined for us.

C syntax structure editor showing the PokemonVirtualTable structure.
The PokemonVirtualTable structure, with inherited members!
The WeedleVirtualTable after applying the correct type in Binary Ninja.
After applying the type to the vtable.

The last thing do would be to rename those functions in the virtual table. For example, we could rename them to the following.

The WeedleVirtualTable after applying the correct type in Binary Ninja and renaming the functions.
The virtual table after renaming the methods.

How did I know the names or that the methods are in the correct order? In this example it's pretty obvious. The methods themselves print what they do. In the real-world, you'll likely have to do deeper RE to figure that out.

And that's it! I mean, you could do the same thing for Caterpie. But in this example, there's not much point, as the methods are all called on pokemon, which we already know. I'll leave that as an exercise to the reader.

Hope this helped! I know it certainly helped me. Cheers!


Acknowledgements

I'd really like to thank Adam Schwalm and their blog post on doing the same in IDA. I used it significantly in my understanding of the C++ ABI and based my program off of this Mammal example. Cheers, Adam.

Concluding Thoughts

If you enjoy these sort of posts, consider following me over on Bluesky.

Subscribe to Sean Deaton

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe