Reverse Engineering Mac OS X

The suspend/resume vulnerability disclosed a few weeks ago (named Prince Harming by Katie Moussouris) turned out to be a zero day. While (I believe) its real world impact is small, it is nonetheless a critical vulnerability. It must be noticed that firmware issues are not Apple exclusive. For example, Gigabyte ships their UEFI with the flash always unlocked and other vendors also suffer from all kinds of firmware vulnerabilities.

As I wrote in the original post, I found the vulnerability a couple of months ago while researching different ways to reset a Mac firmware password. At the time, I did not research the source of the bug due to other higher priority tasks. One of the reasons for its full disclosure was the assumption that Apple knew about this problem since newer machines were not vulnerable. So the main question after the media storm was if my assumption was wrong or not and what was really happening inside Apple’s EFI.

The bug is definitely not related to a hardware failure and can be fixed with a (simple) firmware update. The initial assumptions pointing to some kind of S3 boot script failure were correct.
Apparently, Apple did not follow Intel’s recommendation and failed to lock the flash protections (and also SMRR registers) after the S3 suspend cycle. The necessary information is not saved, so the locks will not be restored when the machine wakes up from sleep.

This also allows finding which Mac models are vulnerable to this bug.
All machines based on Ivy Bridge, Sandy Bridge (and maybe older) platforms are vulnerable. This includes the newest Mac Pro since its Xeon E5 CPU is still based on Ivy Bridge platform. All machines based on Haswell or newer platforms are not vulnerable.

Now let’s jump to the technical part and understand why the bug occurs. I am also going to show you how to build a temporary fix.

0 – The ACPI S3 sleep feature

The ACPI (Advanced Configuration and Power Interface) specification defines a few system power states and transitions. One of those is the S3 state, which is a power saving feature that suspends to memory, meaning that the current operating system state is held in memory. On this mode there is still power usage but it’s vastly reduced. It is mostly used in notebook computers (closed lid) and its performance is important since users expect their computer to wake up from sleep in a short period of time. If it took the same time as a normal boot, there would be no advantages in using it (battery power consumption for example). In practice, this means that the resume path is a special boot path that differs from the normal boot path. The key difference is a performance optimization materialized in a boot script. It can contain the following chipset and processor information:

I/O
PCI
Memory
System Management Bus (SMBus)
Other specific operations or routines that are necessary to restore the chipset and processor configu‐ ration

Instead of reinitializing everything again, the boot script will restore the machine to the same configuration without executing again the whole DXE phase, which is time consuming. Script execution will be faster than restarting and reinitializing all the necessary drivers, as it happens in a regular boot path. Sample contents of a boot script will be shown later.

The following picture from the EFI documentation describes the differences between the normal and S3 boot phases.

A description of each phase – SEC (Security), PEI (Pre-EFI Initialization), DXE (Driver Execution Environment), BDS (Boot Device Selection) – can be found here. The DXE phase is where most of the initialization work is performed (there are some 150 DXE drivers in EFI firmware), so this is the main reason why the boot script is important for a fast resume from sleep cycle (EFI documents from 2003 describe that Microsoft requires a maximum of 0.5 seconds for the S3 resume). It is on the DXE phase that the boot script is created on normal boot path, meaning that the script is created once on a normal boot. The script is stored in physical memory and this is the main reason why the Dark Jedi attack described at CCC 2014 is possible.

1 – Relevant documentation

EFI (Extensible Firmware Interface) specification was published by Intel. It was followed by the UEFI (Unified Extensible Firmware Interface) managed by the Unified EFI Forum (www.uefi.org). Apple forked from EFI v1.10 (Dec 2002) and introduced some changes, for example support for fat EFI binaries (support for both 32-bit and 64-bit systems). Recent OS X versions are 64-bit only so this feature is not used anymore.

The most relevant EFI documents for this post are:

Reference websites:

Phoenix Bios Wiki
EDK
EDK2 Source code
UEFI documentation
LegbaCore (great BIOS/EFI/UEFI research papers)

Relevant papers, presentations and blog posts:

Tools and utilities:

2 – What’s inside the flash?

The EFI contents are stored in a flash memory chip that also contains Intel Management Engine (ME) and other data. The two most used chips on newer Macs are the Micron N25Q064A and Macronix MX25L6405. They are easy to identify on Mac motherboards, while in some models they are easily accessible (such as MacBook Pro Retina) and others they require extensive disassembly (such as MacBook Pro 8,2 and MacBook Air). The following picture shows the location on a MacBook Pro Retina 10,1.

If extensive disassembly is required, the iFixIt.com teardown guides are a great reference. From what I see in teardown pictures of the newest MacBook (the gold, silver model), it appears that those flash chips are no longer used. I do not own this model, so no clue where the EFI image is stored.

Both chips use SPI, meaning that a SPI reader/writer such as the one introduced by Trammell Hudson can be used to read and write its contents. This is the best and safest way to do it and you should definitely get or build one if you plan to do EFI research.

Screenshot of UEFITool processing one of my flash dumps:

Partial output of my own tool with some extra information regarding each firmware volume:

The flash contents are organized into firmware volumes. Some volumes contain the binaries for the different SEC, PEI, DXE phases, while others CPU microcode. One of the volumes is dedicated to the NVRAM and this is where you will find boot settings, crash logs, wifi password, etc.

It is also possible to use the scap files available on EFI firmware updates published by Apple. UEFITool is able to process and extract the files. You can find firmware updates for newer machines on Yosemite updates.

It is important to notice that everything is identified by a GUID (globally unique identifier). These are 128-bit values that identify everything instead of filenames.

The EFI and UEFI specifications define many GUIDs, for example to identify firmware volumes, but there are also many vendor-only GUIDs not published anywhere. Apple firmware is not an exception and contains a few proprietary GUIDs that require reversing to understand their purpose. In this case, Google is your ally, but also EDK/EDK2 sources, and EFI/UEFI documentation. Good GUID compilations can be found here and here.

The contents of a file can be compressed as shown below. Apple uses LZMA algorithm.

Each file contains sections. In the above example, we have a PEI dependency section (basically information about the load order and dependencies of each binary) and a compressed section containing a TE binary (Terse Executable).

A file can also encapsulate another firmware volume as the next screenshot shows:

The conclusion is that the contents of a firmware volume are very flexible and can encapsulate a lot of different information. Lucky for us, UEFITool is quite good at dealing with all this and makes it very easy to modify the contents of each volume.

3 – Executable file types

Two types of executable file types can be found: PE32/PE32+ and TE. PE32 is Portable Executable, the same type used in Windows, while TE is Terse Executable, which is nothing more than a stripped version of PE32. The reason is to reduce the overhead of PE headers. The TE format is used by SEC/PEI phase binaries, while DXE phase binaries are PE32/PE32+.

IDA is able to load TE binaries but it contains critical bugs that make the disassembly output mostly unusable. Snare described how he built his own IDAPython TE loader for older IDA versions. It doesn’t work with newer TE binaries without some fixes. Since I’m not a Python fan, I coded my own loader in C that also tries to locate and name known GUIDs. For now I do not intent to release it so you should go and nag Hex-Rays for support.

Also important to point is that most of the strings are in Unicode format (2 bytes) so keep this in mind while searching for strings. A few binaries contain useful strings but don’t expect to find many. The EDK/EDK2 sources are quite useful as reference.

Another extremely useful resource is the AMI Bios code leak. For obvious reasons, I can’t link to the leaked archive but it’s not very hard to find around the web. It contains closed source code that the EDK/EDK2 releases don’t and it’s extremely useful to speed up the reversing process.

4 – EFI binaries reversing tips and tricks

In EFI/UEFI world there are no standard libraries that export commonly used functions. Instead, function pointers stored in service tables are used. All service tables have the same structure, EFI_TABLE_HEADER:

The header is followed by the services (functions) available in that table.
The following is the structure for EFI_RUNTIME_SERVICES. These are the EFI services available after the operating system takes control.

For example, GetTime “Returns the current time and date, and the time-keeping capabilities of the platform.”

How are these service tables found by EFI binaries?

Let’s use the DXE phase binaries as an example. The entrypoint prototype for a DXE is described in the EFI Driver Execution Environment Core Interface Spec document:

The EFI_SYSTEM_TABLE is another structure where the Boot Services and RunTime Services table pointers can be found. Let’s translate all this into a real DXE binary to see things go.

This is the entrypoint function for a random DXE binary. The first call is very common in these binaries and usually looks like this:

This function retrieves the pointers to Boot and Runtime Services tables and stores them for future usage. Some binaries retrieve the same tables in different functions (weird compiler optimizations?) while others access them directly at the entrypoint function. You can easily recognize these two tables by the 0x68 and 0x58 offsets.
Next is a sample function using the Boot Services function LocateProtocol to locate a specific protocol:

PEI binaries use a different table, EFI_PEI_SERVICES implemented by the PEI Foundation. The concept is the same, locate the table and access services via function pointers. The previously linked Snare’s EFI utils are helpful for reversing, it tries to locate the tables, translate the function pointers to meaningful names and also name known GUIDs. The scripts are not perfect but can be helpful. Binary grep is also very helpful to mass locate GUIDs since there are some 200+ binaries on each dump.

Very important are the calling conventions used.
For 32-bit binaries the calling convention is the standard C calling convention (arguments passed on the stack).

For 64-bit binaries Microsoft’s x64 calling convention is used (arguments passed via RCX, RDX, R8, R9, stack). The stack must be aligned in a 16-byte boundary, and a 32-byte shadow space must be reserved on the stack above the return address. This means that we will see the first stack argument starting at offset 0x20.

SEC and PEI phase binaries are 32-bit (16-bit code also found for CPU initialization) and DXE binaries are 64-bit.

5 – Protocols and PPIs

The EDK reference manual defines protocol as:

“A protocol is a data structure that associates:
a GUID (naming it in an unique manner);
possibly an interface, that is a collection of function pointers and/or public data.

So a protocol may be seen as the public definitions (or a sub-part of the public definitions) of a C++ class, or as a Java interface. A data structure can be associated to the GUID, it generally made of one or more function pointers (the protocol’s functions), and/or public data fields.”

A driver installs one or more protocols that can then be located and used by any other driver. Remember that there is no standard library to link against, so protocols (and PPIs) do this work. The following is an example of a protocol used in S3 sleep feature:

This protocol provides two functions, GetLegacyMemorySize and S3Save. The latter is the function that prepares all the information that is needed in the S3 resume boot path. It will internally use another protocol EFI_BOOT_SCRIPT_SAVE_PROTOCOL, calling CloseTable() that is responsible for storing the boot script in NVS (non-volatile storage).

If we want to locate the driver that publishes this protocol, we can binary grep for this protocol GUID and find all the drivers that install and use it. In this case, it is expected that only a single driver calls S3Save.

The best way to find protocol definitions are the EFI/UEFI documentation and EDK/EDK2 source code. In the PEI phase we can find PPIs (PEIM-to-PEIM Interface). For sake of simplicity, lets assume that they are equivalent to protocols in terms of functionality. From the same EDK Reference Manual:

“PEIM-to-PEIM interfaces are a mechanism, similar to the protocols, that is used to facilitate the communication between two PEIMs. The principle is roughly the same:

PPIs are “named” with a GUID.
PPIs have an optional data structure containing function pointers and data”

Sample PEI phase code from EDK2 using LocatePpi service:

6 – Building and executing the S3 boot script

Most, if not all, boot script content is added by different DXE drivers. The related protocol is EFI_BOOT_SCRIPT_SAVE_PROTOCOL. It publishes two functions, Write and CloseTable.
The Write function is used to write the information to a boot script table. The specification allows multiple boot tables but for now only one table is used, EFI_ACPI_S3_RESUME_SCRIPT_TABLE (0x0).
One of the disassembly listings above, locate_bootscript_save_protocol, is used to locate this specific protocol. It can be found in all the DXE drivers that use this protocol.

The Write function supports a few different opcodes. Each opcode supports different operations and data. The following is an incomplete definition of a few opcodes:

EDK2 supports a few more and Apple also implements some custom opcodes. The opcodes can be vendor specific, so they require reversing to understand their purpose. The functions that deal with each opcode are easy to find. For example, the following listing is a common function found in DXE drivers that implements the EFI_BOOT_SCRIPT_IO_WRITE_OPCODE:

All the opcode functions are be easily identified by the reference to EFI_BOOT_SCRIPT_SAVE_PROTO‐ COL and the value on R8 register before the call to Write.
As previously described, there is a DXE driver that locates the EFI_ACPI_S3_SAVE_PROTOCOL and executes S3Save to make the boot script ready for S3 resume. This function calls CloseTable that allocates a new memory pool to save the script, returning the physical address for the boot script. As previously described, the script is dynamically built on a normal boot path.

How about execution of the boot script?

The S3 boot script execution is initialized by the DXE IPL (Initial Program Load). This is the last binary executed in the PEI phase and if a S3 boot script exists it will follow that special boot path instead of the normal path.

This is done by locating EFI_PEI_S3_RESUME_PPI:

The S3RestoreConfig function is responsible for locating the S3 boot script, other necessary information and trigger the boot script execution via another PPI, EFI_PEI_BOOT_SCRIPT_EXECUTE_PPI:

For a real world implementation and reversing check cr4sh’s blog post.

After the boot script is executed with EFI_PEI_BOOT_SCRIPT_EXECUTER_PPI.Execute(), control is transfered to the OS waking vector and execution resumes at the operating system level.

The Dark Jedi vulnerability exploits this phase. The vulnerability exists because the boot script is saved into unprotected physical memory. This memory can be found and modified from a kernel extension. When the machine comes back from sleep the modified boot script is executed and malicious code can be run at firmware level. The flash memory can be written because the lock protections are never restored.

It should be noticed that all Macs are still vulnerable to this specific attack (to be presented by Trammel Hudson, Xeno Kovah, and Corey Kallenberg at next BlackHat/Defcon).

7 – Finding the vulnerability

The PPI and Protocol GUIDs allow us to track all the binaries involved in the S3 boot script. I disassembled every single binary that uses the EFI_BOOT_SCRIPT_SAVE_PROTOCOL. The interesting DXE driver where the vulnerability occurs is the DXE identified with the GUID DE23ACEE-CF55-4FB6- AA77-984AB53DE823. If you lookup this GUID on the web, you will find a very interesting name, PchInitDxe.efi. What is so special about this one? The flash protections are implemented by the PCH (Platform Controller Hub) controller. This controller provides multiple IO functions and different protections for the flash chip. For example, the Flash Configuration Lock-Down (FLOCKDN) is described on PCH documentation as:

“When set to 1, those Flash Program Registers that are locked down by this FLOCKDN bit cannot be written. Once set to 1, this bit can only be cleared by a hardware reset due to a global reset or host partition reset in an Intel® ME enabled system.”

The Protected Range registers (PRX) can be configured to protect memory ranges of the flash chip itself (not machine RAM). We already saw that there is a writable NVRAM memory region in the flash chip. In practice the protected range registers will write-protect all flash memory except the NVRAM region. There are many other protections available. Please refer to Legbacore presentations for complete descriptions of available protections.

The suspend/resume vulnerability makes it possible to write the previously protected flash memory regions because the FLOCKDN bit is set to zero after the suspend/resume cycle. While this could be caused by some hardware failure, the reality is much simpler. What is missing is the boot script information that restores this register configuration. When the machine is put to sleep, the CPU context is lost so the flash memory is unlocked (this is another reason why Dark Jedi attack is possible) until the S3 boot script is executed. If there is no such information saved then the flash will be left unlocked, making it vulnerable to writes from the operating system level.

How does it look like the code to save the information of this FLOCKDN register?

R_PCH_SPI_HSFS value is 0x3804. This makes it easier to track down the location where this code (or something like it) might be implemented.

We know that newer machines are not vulnerable so let’s start by trying to locate this code on those models. The following disassembly listing is the equivalent to above’s code found on MacBook Pro Retina 11,1 firmware:

This explains why the newer machines are not vulnerable. The FLOCKDN information is being saved to the boot script. The same code (or variant) can’t be found on vulnerable machines firmware.

From this we can conclude that the vulnerability is definitely due to flawed boot script information. Apple failed to implement Intel’s recommendation regarding the flash protections and fails to save it to the boot script on vulnerable machines.

Trammell was kind enough to send me the boot script output between a 10,1 and 11,2 Retina Macs (latest CHIPSEC added support for this but still has some problems with Macs).

The boot script output allows us to reach the same conclusion: older machines don’t have the FLOCKDN and Protected Range registers boot script information and there lies the source of the vulnerability.

To dump this information you can follow CHIPSEC code. Essentially it’s a matter of locating the ACPI variable and the physical memory address where the boot script is located. To read physical memory the DirectHW.kext can be used.

An interesting question is why the difference between Haswell and older platforms?
The AMI bios leak can provide us with an hint. Its reference code for Ivy Bridge and Sandy Bridge platforms are codenamed PantherPoint and CougarPoint. Apple’s Haswell DXE drivers have string references about LynxPoint.

From what I could gather, Haswell platform introduced more complicated power management mode. Apple probably followed Intel’s reference code and this time they were capable to correctly copied and paste the flash locking code.

The other possible theory is that Apple knew about this bug, fixed it in Haswell firmware and opted for not patching older systems, a decision that isn’t exactly rare.

8 – Fixing ourselves the vulnerability

Now let’s go for the really interesting and fun part of this post. Once again, can we fix the vulnerability ourselves instead of waiting for Apple?
The answer is yes, of course we can. It’s all bits and bytes after all and Thunderstrike has shown us there are no hardware protections involved!

Most of the necessary code for the fix is described on the last disassembly listing. We need to first call MmioOr16 and then save_mem_write_opcode.
To build the fix we need to adapt that code to the target firmware and find space where to insert it.

Because there isn’t enough unused space we can use to inject the new code, I have opted for replacing an “unused” function. More on the target function later on. An alternative could have been to add a new section or expand a section, but that would require another read of PE format (I haven’t messed with PE for more than a decade or so) and I didn’t knew the impacts on UEFITool and EFI itself. Replacing code was the faster alternative.

This means that we need to jump to the new code, execute it, execute the original bytes we stole for the jump and resume execution back at the original function where we jumped from. The first jump is placed near a similar area to the non-vulnerable versions.

The new code for the latest MacBook Pro Retina 10,1 firmware available is the following:

As you can see it’s rather simple. From the original function code we know that RootComplexBar is stored in R15 and that B_PCH_SPI_HSFS_FLOCKDN value is 0x8000.
The next call is to save_mem_write_opcode() function that will add the boot script entry. This is also a matter of reversing code that calls this function and setting the right parameters in registers and stack.
The last two MOV instructions are the original ones that were stolen so we could make space for the jump. We resume execution at the next instruction.
This function is called once and references a string called DisableDeepSX which is an EFI variable. It doesn’t appear to be doing anything special so it was a good candidate. We are hacking and experimenting so why not? Failure is part of the process!

The last thing we need to do is to remove the call to the original function since that function now contains the fix code. I have also opted for moving 1 into var_35 instead of the original 0 (just because it appears to be a better option).

Everything is now set to test the patch and check if it works.

9 – Reflashing and a temporarily bricked Retina

The next step is to replace the original DXE binary with the patched version. UEFITool is great for this because it will take care of everything. We just need to select the right GUID, go to the compressed binary and replace it with our patched version (use replace body option *inside* the compressed section for this GUID). UEFITool will (re)organize the firmware volume and update all the necessary CRCs.
After the new image is ready we can replace it via SPI or use the bug itself together with flashrom.

A few minutes later the result is… a black screen. The fans temporarily spin and then shut down. This means that the new bios image has a problem and is unable to boot.
The bad image needs to be replaced with the original dump. Always have a working dump before replacing anything!

What is the reason for failure?
The first task is to be sure that UEFITool packed everything correctly and the CRCs were valid. Yes, everything is ok here.
Next step is to assume there is some other kind of checksum or check somewhere else. I made a few tests and for example the boot firmware volume can be modified without bricking the machine (if you noticed there are two boot firmware volumes, one has invalid CRC on a working dump, maybe it’s a backup volume?). This means that if there is another check it should exist only in the DXE phase.

Read again Thunderstrike presentation notes…
An old unanswered question pops up: what are the last four bytes of the ZeroVector used for?
Trammell says the last eight bytes change between volumes and firmware versions. We know that the first four are the CRC32 but there is nothing about the remaining four. UEFITool doesn’t change them so it also knows nothing about them.

What do we do? Let’s try to make sense of its value. Could it be a reference to free space or something? Yes, bingo at the first attempt (not kidding!). It’s not the free space but the amount of space used by all the files on each volume. My hint on this was that I previously tested modifying only its value and it also bricked the machine. This means that the value is indeed relevant for something.
If you make the difference between the total size of the firmware volume (FVLength field from EFI_FIRMWARE_VOLUME_HEADER) and the volume free space computed by UEFITool we get the value located in the last four bytes of ZeroVector.

On this volume we have a full size of 0x30000 bytes and a ZeroVector value of 0x102B0 bytes.

And its free space is 0x1FD50 bytes. The difference between volume full size and free space is 0x30000-0x1FD50 = 0x102B0, the same value found on ZeroVector.

Since the repacking of the patched binary will slightly modify the free space by a few bytes, that value is wrong and will invalidate the bios image. Compute the new value, fix the header, and recompute the header checksum (CRC16) to the new valid value. Reflash the new image and now it boots!

The issue has been reported to UEFITool author and it will be hopefully fixed in the next few weeks. For now you need to manually update the value and header CRC16.

10 – Does the fix work?

The last step of this adventure is to verify if our boot script modification works or not. And the answer is positive, FLOCKDN is now always set to 1 instead of the vulnerable 0. Repeat the suspend/resume cycle a few times to make sure it works and it’s stable. It really works and machine is stable.

Game over! (well sort of…)

We were able to track down the source of the bug and produce a binary patch for the same bug. It’s not a perfect patch – doesn’t take care of SMRR registers (SMM Range Registers) and misses other flash security features) – but the main goal was to show it is possible and easy to implement. Apple has no excuses to avoid releasing firmware updates for all the vulnerable models.
An interesting project would be to develop a fix for the Dark Jedi issue. A good starting point is Intel’s document “A Tour Beyond BIOS Implementing S3 Resume with EDKII“. It is a paper published a few months before Dark Jedi presentation that describes the issue and proposes a fix. It is rather amusing to read the paper after Dark Jedi was presented since you clearly see Dark Jedi issue there. Hindsight is always 20/20.

11 – Conclusion

This was a long technical post and hopefully a good overall introduction to the EFI/UEFI world. You should read the documentation. It is long but it is overally good and you get a good understanding of the EFI architecture (honestly I like it after you understand how everything fits together).

While I believe the impact to average users is small, it remains a critical issue that can be exploited remotely.

I also do not regret the full disclosure and 0day release since it seems the only way to pressure vendors to fix firmware issues. There is indeed some risk associated with firmware updates but those risks are much lower than the security risks posed by vulnerabilities at firmware level. Hopefully this attitude will finally change. Dark Jedi can also be remotely exploited, so fixing it is also necessary and can be bundled together with the fix for this vulnerability.

12 – Update

Apple just released an update for this bug and Dark Jedi right before this post went public.
I am very happy to see that Apple moved fast enough to fix both bugs and must congratulate them. It as a bit unexpected! Maybe full disclosure and bad publicity work after all.