Hardware-Assisted AddressSanitization

Reducing ASAN memory usage by utilizing Top-Byte Ignore (TBI) on ARM64 hardware. Also, some 24-bit Apple Macintosh history.

Hardware-Assisted AddressSanitization
Photo by Liam Briese / Unsplash
Reducing ASAN memory usage by utilizing Top-Byte Ignore (TBI) on ARM64 hardware. Also, some 24-bit Apple Macintosh history.
The Deaton corollary: if you can't make a meme about something, you don't understand it well enough.

AddressSanitization Recap

In a previous blog post, we covered AddressSanitization and how it works to find all sorts of memory corruption errors. To recap, it uses redzones, shadow memory, and quarantines to accomplish this. Shadow memory maps real memory, where every eight bytes of virtual memory maps to one byte of shadow memory. Shadow memory tracks what is and is not addressable. Access someplace not addressable, and you crash. Redzonesbuffer every memory allocation (stack/heap/global). Redzones are not addressable. Access a redzone, and you crash. Finally, free’d memory allocations are quarantined and marked as not addressable. Access a quarantined location, and you crash.

All these actions add to the process’s memory overhead. As much as 20 times the original use (applications that use many, smaller allocations are penalized the most). It maps up to 20TB of virtual memory for shadow memory (again, by implementation, it is one-eighth the amount of addressable memory).

It would be great to get the bug-finding power of ASAN without the additional overhead. This is where Hardware-Assisted AddressSanitization (HWASAN) enters.

Top-Byte-Ignore (TBI)

Modern architectures (in this case, ARMV8.0, though Intel’s Linear Address Masking and AMD’s Upper Address Ignore do exist) don’t use the full 64 bits of addressable memory. ARM ignores an address's most significant 16 bits (yes, two bytes); when not assigned for a specific purpose, they have to be 0xFFFF or 0x0000. This reduces the amount of addressable memory you can work with. Specifically, 16,384 Pebibytes (PiB) to 0.25 PiB. The scale there looks small, but that’s still 256 Tebibytes or 262,144 Gibibytes. The numbers are mind-boggling high. Maybe we’ll get there someday. For now, the benefit is that those 16 bits are available to us, mostly. The lower of the two bytes (positions [55:48]) are reserved for mapping user or kernel memory. However, we, the developers, can use the top byte (at positions [63:56], inclusive). For ARMv8.0. the ABI does not specify implementation details for these bits. Rather, it’s implementation-specific. ARMv8.5 and 9.0 iterate on this, but we’ll ignore that now.

Memory Tagging

The algorithm is pretty simple:

  • A tag size (in bits) is chosen.
    • i.e., how many of those bits in the top byte do we want to use? Could we store additional data? In that case, we would reduce the tag size.
    • If we choose 8 bits, we can have 2 ** 8 (256) tag values.
  • A tagging granularity is chosen (referred to as the granule).
    • How many bytes of physical memory will the tag size map to?
    • Recall in regular ASAN, there is an eight-byte to one-byte mapping of virtual memory to shadow memory.
    • Increasing granularity makes finding the bug more difficult. Decreasing the granularity adds additional computing and introduces ambiguity.
  • A tag is randomly selected and associated with the memory chunk on allocation. The tag is set in the returned pointer. The tag is also stored in shadow memory.
  • On access, the pointer and shadow memory tag are compared. An exception is raised if the tags differ.
    • Here is where tag size matters. The smaller the tag size, the greater the chance of a collision. Consider a tag size of two bits (2 ** 2, or 4 unique tag values). There is a high chance that, despite being poisoned, the comparison between the pointer and shadow memory match. This results in a missed bug. When collisions occur, the researcher misses out. 🙂

Benefits

Since the tags are stored in the memory addresses themselves, we no longer need redzones or quarantined zones. For the former, accessing the memory address would be caught by the tag mismatch (unless there is a collision, ~6.25% or ~0.39% change with tags of size 4 or 8, respectively). The same goes for the latter; on free, you adjust the memory tag in lieu of placing it in quarantine.

HWASAN has the instrumentation and run-time library that regular ASAN has. Compute and code size penalties between the two are similar. However, given the reduction in shadow memory (depending on tag granularity) and redzones, memory usage is greatly reduced. It might require 15% more memory in certain instances.

My previous article on ASAN introduced the metaphor of a football kicker punting toward either team’s goalpost to get an extra point. This represented two disparate but validly accessible memory locations separated by redzones - a bug ASAN could not catch. You’ll be happy to know that, with HWASAN, the field goals (most likely, choose a good tag size) have different memory tags, preventing a very confusing football match.

Downsides

We talked about tag collisions. That’s certainly a downside. The biggest downside is the lack of support. This is restricted to 64-bit architectures that have top-byte-ignore enabled. Intel and AMD have been slow to adopt this. As a result, you likely won’t see this implemented on a desktop near you. Meanwhile, Android, iOS, and macOS all support it out of the box.

Criticisms

This relates more to Top-Byte-Ignore than to HWASAN, but I figured I would include it here. The Apple Macintosh 128k and the Apple Lisa used the Motorola 68000 microprocessor. It was 32-bit in that the register sizes were 32-bits, but it only had a 24-bit address bus, thus allotting 16MB of addressable space. Apple divided this 16MB space into four equal quadrants for the RAM, ROM, Serial Communication Controller, and Integrated Woz Machine (floppy). Yep, they each got 4MB. What to do with the leftover most significant byte in the master pointer register? They used it to store flags.

Of course, it was foolish to count on unused address bits to stay that way for very long, and it became a problem when the Macintosh transitioned to the 68020 processor in 1987, with the introduction of the Macintosh II.

- Andy Hertzfeld, member of the original Apple Macintosh development team.

The flags signified whether the memory was locked or purgeable. Locked signified a memory address shouldn’t be moved, as it was currently in use. Purgeable means that the address could be released if the memory was tight. The developer could access and manipulate these flags with API functions such as HLock, HUnlock, and HPurge. This in and of itself was fine. The registers could hold 32 bits and the top byte would never show up on the address lane anyway.

The Macintosh 128k at the Neo Preistoria exhibition (Milan 2016), taken by Sailko. Licensed under the CC BY 3.0.

But of course, 24-bit address spaces weren’t around forever. When the 68020 and the 68030 arrived with the Macintosh II, the Macintosh team re-implemented their locking mechanism without using the upper bits but instead placing the flags in the block header. But developers didn’t always use the API functions the Macintosh team developed. Instead, they manipulated the flags themselves. So when their software landed on a machine with 32 bits of addressable memory, this caused crazy-looking addresses that weren’t valid. The developer that implemented the top-byte-ignore flags, Andy Hertzfeld, said that it took a “…year or so to identify and eradicate all the transgressions to upgrade the Macintosh software base to be "32 bit clean", so the full address space could be used.”

For the Macintosh II, the stop-gap solution was to clear the top byte of all generated addresses. This kept the 32-bit address bus operating in 24-bit mode while allowing “dirty” applications to continue working. A 32-bit addressing mode switch was added to run “clean” applications.

How does this relate?

It doesn't relate to HWASAN except for the top-byte-ignore aspect. The main point is that TBI is not a new concept. It’s been around for a while and fell out of favor until recently. Sure, there’s a big difference between 24-bit and 48-bit memory (double the bits!), but can we rely on having “just” 256 TB of RAM? Two years from now, Chrome will be using it all.

This article, and more, are available for free at www.seandeaton.com and Medium.

Other Resources:

Subscribe to Sean Deaton

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe