This blog post is the first in a three-part series describing the challenges one has to overcome when trying to hook the native NTDLL in WoW64 applications (32-bit processes running on top of a 64-bit Windows platform). As documented by numerous other sources, WoW64 processes contain two versions of NTDLL. The first is a dedicated 32-bit version, which forwards system calls to the WoW64 environment, where they are adjusted to fit the x64 ABI. The second is a native 64-bit version, which is called by the WoW64 environment and is eventually responsible for user-mode to kernel-mode transitions.
Due to some technical difficulties in hooking the 64-bit NTDLL, most security-related products hook only 32-bit modules in such processes. Alas, from an attacker’s point of view, bypassing these 32-bit hooks and the mitigations offered by them is rather trivial with the help of some well-known techniques. Nonetheless, in order to invoke system calls and carry out various other tasks, most of these techniques would eventually call the native (that is, 64-bit) version of NTDLL. Thus, by hooking the native NTDLL, endpoint protection solutions can gain better visibility into the process’ actions and become somewhat more resilient to bypasses.
In this post we describe methods to inject 64-bit modules into WoW64 applications. The next post will take a closer look at one of these methods and delve into the details of some of the adaptations required for handling CFG-aware systems. The final post of this series will describe the changes one would have to apply to an off-the-shelf hooking engine in order to hook the 64-bit NTDLL.
When we started this research, we decided to focus our efforts mainly on Windows 10. All of the injection methods we present were tested on several Windows 10 versions (mostly RS2 and RS3), and may require a slightly different implementation if used on older Windows versions.
Injecting 64-bit modules into WoW64 applications has always been possible, though there are a few limitations to consider when doing so. Normally, WoW64 processes contain very few 64-bit modules, namely the native ntdll.dll and the modules comprising the WoW64 environment itself: wow64.dll, wow64cpu.dll, and wow64win.dll. Unfortunately, 64-bit versions of commonly used Win32 subsystem DLLs (e.g. kernelbase.dll, kernel32.dll, user32.dll, etc.) are not loaded into the process’ address space. Forcing the process to load any of these modules is possible, though somewhat difficult and unreliable.
Hence, as the first step of our journey towards successful and reliable injection, we should strip our candidate module of all external dependencies but the native NTDLL. At the source code level, this means that calls to higher-level Win32 APIs such as VirtualProtect() will have to be replaced with calls to their native counterparts, in this case – NtProtectVirtualMemory(). Other adaptations are also required and will be discussed in detail in the final part of this series.
Figure 1 – a minimalistic DLL with only a single import descriptor (NTDLL)
After we create a 64-bit DLL that adheres to these limitations, we can go on to review a few possible injection methods.
As previously discovered by Walied Assar, upon initialization, the WoW64 environment attempts to load a 64-bit DLL, named wow64log.dll directly from the system32 directory. If this DLL is found, it will be loaded into every WoW64 process in the system, given that it exports a specific, well-defined set of functions. Since wow64log.dll is not currently shipped with retail versions of Windows, this mechanism can actually be abused as an injection method by simply hijacking this DLL and placing our own version of it in system32.
Figure 2 – ProcMon capture showing a WoW64 process attempting to load wow64log.dll
The main advantage of this method lies in its sheer simplicity – All it takes to inject the module is to deploy it to the aforementioned location and let the system loader do the rest. The second advantage is that loading this DLL is a legitimate part of the WoW64 initialization phase, so it is supported on all currently available 64-bit Windows platforms.
However, there are a few possible downsides to this method: First, a DLL named wow64log.dll may already exist in the system32 directory, even though (as mentioned above) it’s not there by default. Second, this method provides little to no control over the injection process as the underlying call to LdrLoadDll() is ultimately issued by system code. This limits our ability to exclude certain processes from injection, specify when the module will be loaded, etc.
More control over the injection process can be achieved by simply issuing the call to LdrLoadDll() ourselves rather than letting a built-in system mechanism call it on our behalf. In reality, this is not as straightforward as it may seem. As one can correctly assume, the 32-bit image loader will refuse any attempt to load a 64-bit image, stopping this course of action dead in its tracks. Therefore, if we wish to load a native module into a WoW64 process we must somehow go through the native loader. We can do this in two stages:
- Gain the ability to execute arbitrary 32-bit code inside the target process.
- Craft a call to the 64-bit version of LdrLoadDll(), passing the name of the target DLL as one of its arguments.
Given the ability to execute 32-bit code in the context of the target process (for which a plethora of ways exist), we still need a method by which we can call 64-bit APIs freely. One way to do this is by utilizing the so-called “Heaven’s Gate”.
“Heaven’s Gate” is the commonly used name for a technique which allows 32-bit binaries to execute 64-bit instructions, without going through the standard flow enforced by the WoW64 environment. This is usually done via a user-initiated control transfer to code segment 0x33, that switches the processor’s execution mode from 32-bit compatibility mode to 64-bit long mode.
Figure 3 – a thread executing x86 code, just prior to its transition to x64 realm.
After the jump to the x64 realm is made, the option of directly calling into the 64-bit NTDLL becomes readily available. In the case of exploits and other potentially malicious programs, this allows them to avoid hitting hooks placed on 32-bit APIs. In the case of DLL injectors, though, this solves the problem at hand as it opens up the possibility of calling the 64-bit version of LdrLoadDll(), capable of loading 64-bit modules.
Figure 4 – for demonstration purposes, we used the Blackbone library to successfully inject a 64-bit module into a WoW64 process using Heaven’s Gate.
We will not go into any more detail about specific implementations of “Heaven’s Gate”, but the inquisitive reader can learn more about it here.
Injection via APC
With the ability to load a kernel-mode driver into the system, the arsenal of injection methods at our disposal grows significantly. Among these methods, the most popular is probably injection via APC: It is used extensively by some AV vendors, malware developers and presumably even by the CIA.
In a nutshell, an APC (Asynchronous Procedure Call) is a kernel mechanism that provides a way to execute a custom routine in the context of a particular thread. Once dispatched, the APC asynchronously diverts the execution flow of the target thread to invoke the selected routine.
APCs can be classified as one of two major types:
- Kernel-mode APCs: The APC routine will eventually execute kernel-mode code. These are further divided into special kernel-mode APCs and normal kernel-mode APCs, but we will not go into detail about the nuances separating them.
- User-mode APCs: The APC routine will eventually execute user-mode code. User-mode APCs are dispatched only when the thread owning them becomes alertable. This is the type of APC we’ll be dealing with in the rest of this section.
APCs are mostly used by system-level components to perform various tasks (e.g. facilitate I/O completion), but can also be harnessed for DLL injection purposes. From the perspective of a security product, APC injection from kernel-space provides a convenient and reliable method of ensuring that a particular module will be loaded into (almost) every desired process across the system.
In the case of the 64-bit NT kernel, the function responsible for the initial dispatch of user-mode APCs (for native 64-bit processes as well as WoW64 processes) is the 64-bit version of KiUserApcDispatcher(), exported from the native NTDLL. Unless explicitly requested otherwise by the APC issuer (via PsWrapApcWow64Thread()) the APC routine itself will also execute 64-bit code, and thus will be able to load 64-bit modules.
The classic way of implementing DLL injection via APC revolves around the use of a so-called “adapter thunk”. The adapter thunk is a short snippet of position-independent code written to the address space of the target process. Its main purpose is to load a DLL from the context of a user-mode APC, and as such it will receive its arguments according to the KNORMAL_ROUTINE specification:
Figure 5 – the prototype of a user-mode APC procedure, taken from wdm.h
As can be seen in the figure above, functions of type KNORMAL_ROUTINE receive three arguments, the first of which is NormalContext. Like many other “context” parameters in the WDM model, this argument is actually a pointer to a user-defined structure. In our case, we can use this structure to pass the following information into the APC procedure:
- The address of an API function used to load a DLL. In WoW64 processes this has to be the native LdrLoadDll(), as the 64-bit version of kernel32.dll is not loaded into the process so using LoadLibrary() and its variants is not possible.
- The path to the DLL we wish to load into the process.
Once the adapter thunk is called by KiUserApcDispatcher(), it unpacks NormalContext and issues a call to the supplied loader function with the given DLL path and some other, hardcoded arguments:
Figure 6 – A typical “adapter thunk” set as the target of a user-mode APC
To use this technique to our benefit, we wrote a standard kernel-level APC injector and modified it in a way that should support injection of 64-bit DLLs into WoW64 processes (shown in Appendix A ). Albeit promising, when attempting to inject our DLL into any CFG-aware WoW64 process, the process crashed with a CFG validation error.
Figure 7 – A CFG validation error caused by the attempt to call the adapter thunk
In the next post we will delve into some of the implementation details of CFG to help grasp why this injection method fails, and present several possible solutions to overcome this obstacle.
Appendix A – complete source code for APC injection with adapter thunk