Gotta RE 'em All: Reversing C++ Virtual Function Tables with Binary Ninja
C++ can be frustrating to reverse engineer. Explore how to reverse engineer those with Binary Ninja.
*OS reverse engineers have it rough. In addition to understanding the C, Swift, and Objective-C ABIs and runtimes, they also must understand C++. Coming from a Linux background, where C++ is considered a "...horrible language", this is an area that I find myself lacking. In that regard, I hope that this post is as helpful to me as it is the reader.
We're specifically going to look at virtual methods. The compiler is informed of virtual methods with the virtual specifier.
struct Pokemon {
...
virtual void cry(void) = 0;
virtual void attack(void) = 0;
virtual void flee(void) = 0;
}
virtual
signals to the compiler that the method supports dynamic dispatch (which is different than late-binding, which I just learned). This is just a fancy way of saying that the bindings are known at runtime (dynamic) and not at compile time (static). And why aren't they known? Because we don't which derived class overrides the base implementation.
Consider the following. We create two structures which derive from Pokemon
. Each implements (overrides) the base implementation of each method.
#include <cstdlib>
#include <iostream>
struct Pokemon {
Pokemon() { std::cout << "Pokemon::Pokemon\n"; };
virtual ~Pokemon() { std::cout << "Pokemon::~Pokemon\n"; };
/* These methods must be 'pure' virtual and set to 0.
* This informs the compiler that the method *must* be
* overwritten by a child class. Otherwise, the compiler
* assumes that there is an implementation somewhere.
* Taken litterally, we're defining a NULL implementation.
*/
virtual void cry(void) = 0;
virtual void attack(void) = 0;
virtual void flee(void) = 0;
};
struct Caterpie : Pokemon {
Caterpie() { std::cout << "Caterpie::Caterpie\n"; };
virtual ~Caterpie() { std::cout << "Caterpie::~Caterpie\n"; };
virtual void cry(void) override { std::cout << "Caterpie says 'Caterpie!'\n"; };
virtual void attack(void) override { std:: cout << "Caterpie used string shot!\n"; };
virtual void flee(void) override { std::cout << "Caterpie fled!\n"; };
};
struct Weedle : Pokemon {
Weedle() { std::cout << "Weedle::Weedle\n"; };
virtual ~Weedle() { std::cout << "Weedle::~Weedle\n"; };
virtual void cry(void) override { std::cout << "Weedle says 'Weedle!'\n"; };
virtual void attack(void) override { std:: cout << "Weedle used poison sting!\n"; };
virtual void flee(void) override { std::cout << "Weedle fled!\n"; };
};
Now what would happen if we declared a Pokemon
at compile-time, but assigned it to either a Caterpie or Weedle at run-time? The implementation of which could look something like the following.
#include <iostream>
#include "pokemon.h"
int main(int argc, const char * argv[]) {
/* Seed the random number. */
srand((unsigned int)time(NULL));
/* Create a generic Pokemon. */
Pokemon * pokemon;
/* Flip a coin to see which version we're playing. */
if (rand() % 2) {
std::cout << "You're playing Pokemon Red!\n";
pokemon = new Weedle();
} else {
std::cout << "You're playing Pokemon Yellow!\n";
pokemon = new Caterpie;
}
/* Make the Pokemon do some tricks. */
pokemon->cry();
pokemon->attack();
pokemon->flee();
return EXIT_SUCCESS;
}
If you run this program and flip heads (an odd number, I guess), you'll get the following output:
You're playing Pokemon Red!
Pokemon::Pokemon
Weedle::Weedle
Weedle says 'Weedle!'
Weedle used poison sting!
Weedle fled!
The compiler knows that we have a Pokemon
, but it cannot know which derived structure implements it. So how does the program correctly run Weedle
's cry, attack, and flee?
Virtual Tables
At compile time, the compiler inserts a virtual pointer – vptr
or __vptr
– into the base structure. So our Pokemon
would look like this:
struct Pokemon {
/* The vptr is inserted automatically by the compiler. */
PokemonVirtualTable * vptr;
...
virtual void flee(void) = 0;
};
Pokemon's vptr
points to a virtual table. A virtual table is simply a table of pointers to the structure's functions. The virtual table for Pokemon would look like the following:
+----------------------+---------------+
| Pokemon VirtualTable |
+----------------------+---------------+
| Offset in VTable | Function |
+----------------------+---------------+
| 0x0 | Destructor #1 |
| 0x8 | Destructor #2 |
| 0x10 | cry |
| 0x18 | attack |
| 0x20 | flee |
+----------------------+---------------+
Why the two destructors? The compiler (implementation specific) inserts one destructor for dynamically-allocated objects and another for non-dynamically-allocated objects. The rest should look familiar, cry
, attack
, and flee
.
What about the other two classes, Caterpie
and Weedle
? Theirs will look similar, and in fact, must match the layout of the base structure exactly. I'll copy and paste to make this explicit.
+----------------------+---------------+
| Caterpie VirtualTable |
+----------------------+---------------+
| Offset in VTable | Function |
+----------------------+---------------+
| 0x0 | Destructor #1 |
| 0x8 | Destructor #2 |
| 0x10 | cry |
| 0x18 | attack |
| 0x20 | flee |
+----------------------+---------------+
When the Caterpie struct is constructed, the inherited vptr
will point to CaterpieVirtualTable
, not PokemonVirtualTable
. So now when the program calls pokemon->cry()
, it'll find the correct implementation at offset 0x10 in CaterpieVirtualTable
.
Reversing the Structures
Now that we've talked the implementation, let's actually observe what's going on. I compiled the program with Xcode and imported it into Binary Ninja 4.2.6455-dev (02c8da1e). Below is the entry point of the Mach-O.
There's a few things to key in on.
- The function declares a pointer (
x19
) at the start of the function. x19
is assigned to the value returned byoperator_new
in both conditional blocks.- After the conditional blocks, we see the offsets of
x19
(2, 3, 4, 1) being treated as functions calls, wherex19
itself is the sole argument.
The first is the easiest to address. Since we made the program, we know the function is declaring a Pokemon
. Simple enough.
The second point warrants further discussion. In the first conditional block, for Pokemon Red, the function allocates 8 bytes of memory assigns it to x0_5
variable. Why 8 bytes?
On arm64e (that this program was compiled against), pointers are 8 bytes wide. Since our Weedle
struct doesn't have any explicitly added members, it only contains the vptr
member. Since that's a pointer, the width of the entire structure is only 8 bytes.
This goes without saying that x0_5
is our Weedle
. After assigning the pointer to x19
(our Pokemon
), the function then calls sub_100003558(x0_5)
. Let's take a look.
The function accepts one argument. It then assigns some address to that argument. From looking at the caller, _start
, we know that the argument is our Weedle
. We also know that Weedle
structure only has one member, vptr
, and this exists at offset 0x00.
So what's actually happening is that we're setting our weedle->vptr
to &data_100004148
. What's at that address? Perhaps surprisingly, the virtual table for Pokemon
.
We didn't discuss this, but the constructor for a derived class always invokes the base class constructor to rely on a full initialized object. Microsoft has some good documentation on this.
We'll take a look at this vtable in a second, but first notice that this function then reassigns weedle->vptr
to &data_1000040e8
. This address contains the vtable for Weedle
. And that's it. This function is just a Weedle
constructor that invokes the base and derived class constructors which just print to stdout.
To clean up this function, we can start by renaming it. We know it constructs a Weedle
, so I figured this was a fitting name. You could also name the function Weedle::constructor
to make it a little more explicit (note that you can add the :: and other dis-allowed characters in Binary Ninja by wrapping it in backticks).
Then, define a structure by selecting the arg1
parameter and hitting s
. Add one member called vptr *
by selecting the Weedle
type over in the Type window.
Then, just rename data labels to their corresponding virtual table names.
Before digging into the virtual function tables, let's do the same thing for the Caterpie constructor seen in the other conditional basic block within _start
. You should get something that looks like this.
Now let's go back to _start
to admire our work. Binary Ninja propagates the typing backwards. It's come to the conclusion that everything is a Caterpie
. While incorrect still, it's far more descriptive than what we started with.
Okay, now let's actually go back to the constructWeedle
function. Specifically, let's examine what's at the address labeled as PokemonVirtualTable
.
This matches exactly with what we predicted in our ascii table above; two destructors and one method for each virtual method (cry, attack, and flee). If we turn this into a structure, we can get _start
to show us the exact methods we're calling instead of offsets into x19
(as of now, incorrectly typed as a Caterpie)
.
Two curious bits, however. First, if you checkout the two destructors (sub_100003bc8
and sub_100003bcc
), you'll see they both just call trap(1)
.
What's up with this? The descriptor for Pokemon
is defined; it's not a pure virtual function. See? Here's the definition: virtual ~Pokemon() { std::cout << "Pokemon::~Pokemon\n"; };
This is likely a compiler-ism. The compiler knows that a `Pokemon` is never allocated because it has virtual functions that aren't implemented. So there's some optimization going on, and we'll see our std::cout << "Pokemon::~Pokemon\n"
later in the destructors for the derived structs.
Second, the method implementations just point to ___cxa_pure_virtual
. This is because we didn't implement them. We defined them as NULL. So there's no function to point to!
This also makes it tricky to define which offset corresponds to which method. So when we make our structure, let's assume that the functions are in the same order that we defined in our header file. If this isn't the case, we can clean it up when we look at the virtual tables for Caterpie and Weedle.
Sweet! It looks a little bit better. But it doesn't do much on its own. Let's jump back to _start
and see some magic.
We know that x19
is a Pokemon
and not a Caterpie
. So let's make a structure and retype it to capture that. Unlike the Caterpie and Weedle structures we made before, which have a vptr
typed as a void *
, this Pokemon
struct will instead have a vptr
typed as a struct * PokemonVirtualTable
.
Now see what happens when we apply that. Suddenly _start
makes a whole lot of sense. Note that because of forward type propagation, you may have to retype caterpie
and weedle
to their derived classes.
So we're essentially done with _start
; it looks about as clear as it's going to get. For final touches, we could go back to the virtual tables for Weedle and Caterpie, define those, and change the respective vptr
members to the new types.
To speed things up, we can have the WeedleVirtualTable and CaterpieVirtualTable derive from the PokemonVirtualTable. To do this, open the structure editor in Binary Ninja by selecting the Type window and hitting S
, or right click and select "Create a new structure...".
Name the structure. Then under "Base", type the PokemonVirtualTable
and add it at offset 0x00 (not 0x28). Make sure to hit "Add". Ensure "Propagate data variables" is selected.
What does this do? If you open the C structure after creating this, you can see that the pointers have been inherited from the PokemonVirtualTable. And in fact, belong to that namespace. Then, retyping the WeedleVirtualTable
label to a WeedleVirtualTable
(okay, maybe I should have used different names, sorry if this is confusing to the reader), we get all of the members defined for us.
The last thing do would be to rename those functions in the virtual table. For example, we could rename them to the following.
How did I know the names or that the methods are in the correct order? In this example it's pretty obvious. The methods themselves print what they do. In the real-world, you'll likely have to do deeper RE to figure that out.
And that's it! I mean, you could do the same thing for Caterpie
. But in this example, there's not much point, as the methods are all called on pokemon
, which we already know. I'll leave that as an exercise to the reader.
Hope this helped! I know it certainly helped me. Cheers!
Acknowledgements
I'd really like to thank Adam Schwalm and their blog post on doing the same in IDA. I used it significantly in my understanding of the C++ ABI and based my program off of this Mammal example. Cheers, Adam.
Concluding Thoughts
If you enjoy these sort of posts, consider following me over on Bluesky.