Win32API Interceptor
Final Report
Win32API Interceptor A Microsoft Windows API function calls Interception application
Final Report
1
Win32API Interceptor
Final Report
Table of contents Table of contents ...................................................................................2 Abstract ..................................................................................................4 Introduction ............................................................................................4 The Goal – Project scope ......................................................................4 Technologies used in the solution’s architecture ..............................5 Microsoft Research “Detours” Technology .................................................... 5 Loading a DLL into a process’s context ............................................................. 5 Detouring a function........................................................................................... 8 COM .................................................................................................................. 12 DllInjectionAppLoader...................................................................................... 12 InterceptLogger ............................................................................................... 13 Microsoft Access Data-Base........................................................................... 14
The Solution .........................................................................................15 Architecture...................................................................................................... 15 Win32API Interceptor....................................................................................... 16 InterceptLogger ............................................................................................... 16 DllInjectionAppLoader...................................................................................... 16 \\.\pipe\Win32APIInterceptor............................................................................ 16 TraceAPI.DLL .................................................................................................. 16 Spawned Process............................................................................................ 16 Code snippets .................................................................................................. 17 The DETOUR_TRAMPOLINE macro .............................................................. 17 DetourGenMoveEax function........................................................................... 17 DetourFunctionWithTrampolineEx function ..................................................... 18 detour_insert_detour function .......................................................................... 19 Creating a trampoline function for the "CreateProcessW" function.................. 21 Instrumenting the "CreateProcessW" function ................................................. 21 The "CreateProcessW" Detour function........................................................... 21 InjectLibrary function........................................................................................ 22 The Application Guide..................................................................................... 25 Win32APIInterceptor installation...................................................................... 25 GUI guide ........................................................................................................ 27 Hands-on example........................................................................................... 30
2
Win32API Interceptor
Final Report
Known Issues................................................................................................... 31
Appendices...........................................................................................32 Microsoft Research “Detours” ....................................................................... 32
3
Win32API Interceptor
Final Report
Abstract This project introduces a novel approach to intercept Win32API function calls. It is based on a Microsoft-research technology called Detours. The final product of this project is a MS Access based application that logs all the Win32API function calls that are issued by an application of the user's choice. This project was created by: Dr. Ilana David
[email protected] Ben Bernstein
[email protected]
Kfir Karmon
[email protected]
Polina Shafran
[email protected]
Instructor and EE Software Lab chief engineer Instructor and a developer in the Microsoft R&D Haifa center Student of the Computer Science department Student of the Computer Science department
Introduction Innovative systems research hinges on the ability to easily instrument and extend existing operating system and application functionality. With access to appropriate source code, it is often trivial to insert new instrumentation or extensions by rebuilding the OS or application. However, in today’s world of commercial software, researchers seldom have access to all relevant source code. In this project we use "Detours", which is a library for instrumenting arbitrary Win32 functions on x86 machines. Detours intercepts Win32 functions by re-writing target function images. While prior researchers have used binary rewriting to insert debugging and profiling instrumentation, to our knowledge, Detours is the first package on any platform to logically preserve the un-instrumented target function (callable through a trampoline) as a subroutine for use by the instrumentation. Using the unique trampoline design is crucial for extending existing binary software. Since the project’s scope is bounded by an academic course we did not implement a whole solution. We mainly concentrated in understanding the technologies that were involved and to produce a working prototype of an infrastructure that intercepts Windows API functions (for NT-family OSs).
The Goal – Project scope The goal that we set was to compile a non-intrusive framework that intercepts Windows API function calls issued by basic Win32 application. The main interest in this project was to get familiar with the Detours technology and to develop a working prototype of a real world, usable, application. Therefore, some shortcuts were taken, several easy though non efficient methods ere used in order for us to be able to focus on the main goal that we set for this project.
4
Win32API Interceptor
Final Report
Technologies used in the solution’s architecture Microsoft Research “Detours” Technology The Detours technology was conceived in the Microsoft Research labs. This is, yet, another method for intercepting method calls. Many techniques exist to uphold this task, though, this mechanism is non intrusive - the executable is not altered; only its memory image is changed. This way you do not need to compile the code again (no need for sources). Further more you can run several instances of the executable and intercept only the instances that interest you. In this section we’ll explain how this is achieved. It should be noted that Detours can in fact intercept any function call and not just Windows API function calls. We will describe below the way Detours work. We will include code snippets from the Detours code for further explanations.
Loading a DLL into a process’s context Let’s assume we magically have a DLL that includes all interception information that if we can make it load itself into a process’s memory space it will intercept all Win32PI function calls. (In the next section we shall describe how to make that magical DLL) What we need to do is to force the process that we want to intercept to generate a call to LoadLibrary() with the given DLL. The following steps and code snippets describe how to achieve this task: (There is a diagram tailing this explanation that might help clarifying this process) 1) First of all we want to create a new process for the application we want to intercept. Since we want to create it in ‘suspended’ mode, so we can inject our DLL, we will create it and pass the “CREATE_SUSPENDED” creation flag to the CreateProcess() function like so : DWORD dwMyCreationFlags = (dwCreationFlags | CREATE_SUSPENDED); if (!CreateProcess(lpApplicationName, lpCommandLine, lpProcessAttributes, lpThreadAttributes, bInheritHandles, dwMyCreationFlags, lpEnvironment, lpCurrentDirectory, lpStartupInfo, &pi)) { return FALSE; }
5
Win32API Interceptor
Final Report
2) Now we hold a instance of the PROCESS_INFORMATION structure named pi that consists the created process’s handle (and the main thread’s handle) 3) The next step is to acquire the main thread’s context (a CONTEXT structure, our instance will be named cxt) we’ll need it to append assembly calls to LoadLibrary() on the stack and to update the eip register. This is done like so: GetThreadContext(hThread, &cxt); (hThread is a handle to the main thread of the suspended process)
4) Now we’ll create a structure that will hold all the generated assembly code and parameters. The structure looks like this: struct Code { BYTE rbCode[128]; CHAR szLibFile[512]; } Code rbCode will hold the assembly code and szLibFile will hold a copy of the
injected DLL name. 5) Let’s calculate the beginning address of the code structure so we can add assembly code to there: nCodeBase = (cxt.Esp - sizeof(code)) Actually, we will add the assembly code to the code structure (that is resident
in our process’s memory space) and only when we are done we shall copy the structure (containing the code) to the new spawned-suspended process’s memory. The target address will be nCodeBase. All the memory relative addresses/values will be calculated using the target location (nCodeBase) 6) Now let’s copy the name of the injected DLL to code.szLibFile Like so: CopyMemory(code.szLibFile, pDLLName, strlen(pDLLName)+1); Where pDLLName is a pointer to the injected DLL’s name.
Note that the string is copied to the code structure since it will be copied to the target process’s memory space (this way we will have the DLL’s name in the target process) 7) Next step is to write a “push” command in the code structure that will push a pointer to the DLL’s name – this will be used as a parameter for the LoadLibrary() function call. This is done using a inner function called DetourGenPush(). We will not dig inside this function too, but basically what it does is write the assembly “push” command’s opcode and the address to be pushed. pbCode = DetourGenPush(code.rbCode, nCodeBase + offsetof(Code,szLibFile)); Where offsetof() is a macro that retrieves the offset of a member of a
structure from the beginning of the structure. 8) Generating a “call” assembly function call is the next step. This too is done by using an inner function called DetourGenCall(). Again, we will not inline this function, but summarize that it writes the assembly “call” command’s opcode and a relative address to jump to. DetourGenCall(pbCode, pfLoadLibrary, (PBYTE)nCodeBase + (pbCode - code.rbCode)); Where pfLoadLibrary is the address of the LoadLibrary() function
6
Win32API Interceptor
Final Report
9) Now we’ll add a last generated assembly function call. It will be an unconditional jump to the address that is currently in the EIP register (the address of the next code command, right after we suspended the process). This is necessary since after we resume the process/thread we want it to load our DLL and then to resume its regular code path as it was before we interrupted it. As you probably guessed it is done by an inner function called DetourGenJmp(), this function will write the JMP’s opcode and the relative address to jump to: DetourGenJmp(pbCode, (PBYTE)cxt.Eip, (PBYTE)nCodeBase + (pbCode - code.rbCode));
10) Now we need to change the thread’s context so that the EIP will point to our first generated function call and the ESP to the address right after our generated code (so LoadLibrary() will not overwrite our function calls). This is done by editing the cxt’s members and calling the SetThreadContext() function, like so: cxt.Esp = nCodeBase - 4; cxt.Eip = nCodeBase; SetThreadContext(hThread, &cxt);
Note: It is important to notice that code is executed from low-addresses to high-addresses whereas the stack enlarges the other way around. Since we added the generated code to the pbCode pointer (and enlarged it every time) the code has been written upwards on the stack starting from the initial value it was set to (code.rbCode) and going backwards (to higher addresses), consult the illustration below for more details. 11) We’re nearly there … Now let’s unprotect the target process’s memory and copy the code structure to the base address we calculated (nCodeBase): VirtualProtectEx(hProcess, (PBYTE)nCodeBase, sizeof(Code), PAGE_EXECUTE_READWRITE, &nProtect); WriteProcessMemory(hProcess, (PBYTE)nCodeBase, &code, sizeof(Code), &nWritten);
12) One last thing: After writing code to the memory (during execution) one must call the FlushInstructionCache()function so the CPU will know to invalidate its inner cache of commands (pipeline): FlushInstructionCache(hProcess, (PBYTE)nCodeBase, sizeof(Code))
13) That’s it!!! lets resume the thread and let it load our DLL, thereafter it will continue it its original course: ResumeThread(hThread);
7
Win32API Interceptor
Final Report
This illustration displays how the memory looks right before we resume the thread. (step 13): Our Process
New Spawned (Soon to be intercepted) Process 0xFF….F
0xFF….F
Old ESP
DLL-name (copy)
Copy of the Code structure
JMP Old(EIP)
struct Code{ …. …. } code;
CALL LoadLibrary PUSH DLL-name
UINT32 nCodeBase
New EIP = Old(ESP) – sizeof(Code)
CreateProcess() (Suspended) 0x00….0
0x00….0
New ESP
Detouring a function Now, for the real thing. In this section we'll describe how the Detours mechanism works and how it was incorporated into our project. As we described above, the Detours mechanism is a Microsoft-Research technology that was created to allow "hooking" to binary function calls at run time. The application itself is not changed nor do you need to recompile the application (as opposing to code coverage tools for example) In general, the way Detour accomplishes this task is by changing the application's assembly code that was loaded into the memory so that instead of going to the real functions' code it jumps to the detouring code. we stated "jumps" above since this is precisely the way detours does the trick – it takes the 5 first bytes of the function you want to detour (assuming it has at least 5 bytes, this is the biggest restriction of using this method) and it writes it down in a "Trampoline function". Instead of those 5 bytes an unconditional jump is written destined to jump to the "Detour function", see bellow. Then it creates a new code block, called the "Detour function", this function includes the user's interception code (any thing he wants to do before the real function operates). Appending this code is an assembly call function to the "Trampoline function". As you might recall, the "Trampoline function" includes the 5 bytes that were taken from the original function. At the end of the "Trampoline function", Detours appends an unconditional jump to the rest of the original function's code. 8
Win32API Interceptor
Final Report
And now for the unwinding: when the original function hits the end it "Returns" to the calling function and that would be… the "Detour function", since the "Trampoline function" used an unconditional jump the return address on the stack is of the "Detour Function". When the "Detour Function" completes then it "Returns" to … the function that called the original function to begin with (and not to the original function itself since, as you recall, we added an unconditional jump to the "Detour function"). That’s it!
9
Win32API Interceptor
Final Report
Easy, ha? Well we'm well aware that the explanation above is a "bit" obscure. To combat this we will now add a diagram that will express this notion. The diagram bellow is based on a diagram that was introduced in a PowerPoint presentation that is included in the detours archive file that can be downloaded from the web. Before Detours:
1. Call
Calling
Called function
function
("Original Function")
2. Return
After Detours: 1. Call
2. Jump
3. Call
4. Jump
Calling
Called function
Detour
Trampoline
Original
function
("Original Function")
Function
Function
Function
6. Return
5. Return
Diagram: How Detours change the original functions' calling sequence Talk about "a picture is worth a thousand words…"
10
Win32API Interceptor
Final Report
Now we'll display the code behind this magic. The following diagram illustrates the change in the assembly code that occurs after you apply the Detour mechanism on a function. This diagram, too, is based on a slide from a PowerPoint presentation that is accompanied in the Detours archive. Before Detours: Target: push ebp
After Detours: Target: [1 byte]
mov
ebp,esp
[2 bytes]
push
ebx
[1 bytes]
push
esi
[1 byte]
push
edi
jmp
Detour
push
edi
[5 bytes]
....
Detour:
....
...Your code... Call Trampoline ...More of your code...
Trampoline: push
ebp
mov
ebp,esp
push
ebx
push
esi
jmp
Target+5
Diagram: The code beneath the Detours mechanism Now that we clearly understand how the mechanism works we need to understand how to create the "Detours functions" and how to connect them to the "Original functions" we want to detour. The Detours library comes with some code that creates this connection, meaning, given a function you want to detour and a "Detour Function" that contains the code you want to inject it will instrument the "original function" like we described above. The process of a function-instrumentation, done by creating the DLL that we described in the section above, is divided into two: 1. Create a "Trampoline Function" and store in it the address of the original function. (Done in compile time) For this task Detours present a c-macro called: "DETOURS_TRAMPOLINE(
, )", this macro generates code of a Trampoline Function and stores the address of the Original function in it. This macro can be found in the "Detours.h" file. 2. Connect the Trampoline with the Detour function and the Original Function, using the stored address of the Original Function in the Trampoline. (Done in runtime). This is done by using the function: "DetourFunctionWithTrampoline(, )", the Trampoline function name is the same function name as declared in the macro in the first bullet (above), and the Detour function name is the function's name that you want to be called instead of the original function (see the bullet above for the "original function name")
11
Win32API Interceptor
Final Report
To sum it up, for every "Original function" that you want to detour, you need to call the "DETOURS_TRAMPOLINE" with the original function and the signature of the trampoline function (which should be the same as the signature of the Original Function) then you should add a call to the "DetourFunctionWithTrampoline" function that will bind the Trampoline and your function, the Detour function, in which you can add the code that you want to run before the call to the original function. You should not forget one important thing, in the Detour function you write, you should add a function call to the Trampoline function (This function, as you can recall, holds the first few instructions of the original function and a jump call to the rest of the function) You don't have to call the Trampoline function. If you don't call it, there will be no run time error, what will happen is that when the detour function will terminate it'll return to the calling function without running any of the original function's code. This, in fact, is a way to replace the original function with your implementation. Further more, you could add code after the call to the trampoline function (in the Detour function), and that code will actually run after the original implementation. Since we wrap all this code in a DLL binary we want the instrumentation to happen in the DllMain() function, when it is called with the "reason" parameter set to: "DLL_PROCESS_ATTACH". (To be specific, we want the calls to "DetourFunctionWithTrampoline" to exist in DllMain()) This will ensure that when the LoadLibray() function will be injected into the instrumented process (as we saw in the first section), the calls to the " DetourFunctionWithTrampoline" functions will run as soon as the process will resume execution.
COM Describing the COM technology is way beyond the scope of this document. Further more COM is only used as a by-product, it is not the main technology used in the project. Never the less we will generally describe how this technology helped us in the project. COM is a way to share objects, created in one language, in another language. It wraps the object in a binary capsule that can be interpreted in several languages. It is a bit more complicated that what is described above, and COM has more into it that only what we stated. The main reason we used COM objects is because we had C/C++ code that implemented the detours functionality and the application we wrote was based on Visual basic for Application. Both C++ and VB handle COM objects and it was a good way to run the needed functionality from within the VB application's memory space and not as separate processes. In the following sections we'll describe the two COM objects we created
DllInjectionAppLoader This COM object will spawn a new process with the requested application and will inject a DLL that includes function-instrumentation code (as was described in the Microsoft research "Detours" technology section) 12
Win32API Interceptor
Final Report
This object implements the IDllInjectionAppLoader interface. The IDllInjectionAppLoader declares the following methods and properties:
HRESULT LoadApplication( [in] BSTR pszExePath, [in] BSTR pszDllPath);
This method loads the application that resides in the pszExePath location and injects the Detours DLL located at pszDllPath into it
HRESULT KillApplication();
This method terminates the loaded application (if one is loaded) The application should have been loaded using the LoadApplication() function
HRESULT IsAppLoaded([out, retval] VARIANT_BOOL* pVal);
This property returns true if and only if an application was loaded using the LoadApplication() function and the application did not terminate (either by itself or by using the KillApplication() method)
InterceptLogger This COM object will log the Win32API functions that are called during the application that was spawned with the injected DLL. This object implements the IInterceptLogger interface. The IInterceptLogger declares the following methods and properties:
HRESULT StartLogging( [in] VARIANT_BOOL bBlocking, [in] BSTR ODBC_DSN);
This method opens the OS-pipe and logs every message from it to the DB. bBlocking should be true if and only if you wish the function to block. This is useful if you write a VB-script that plays the role of the listener. ODBC_DSN is the name of the ODBC DSN that the logger should write to.
HRESULT Shutdown();
This method should be called if you want to close the connection to the data base. After calling this method no logging messages will be inserted into the database until StartLogging() will be call again.
HRESULT IsConnected([out, retval] VARIANT_BOOL* pVal);
this property returns true if and only if the following terms co-exist: o StartLogging() function was called previously and the connection was made successfully o Shutdown() was not called after StartLogging() was successful
HRESULT AddFunctionToFilter([in] BSTR FunctionName);
This method adds the function name that is passed in the parameter FunctionName, to a list of filtered functions
HRESULT SetFilterType([in] long FilterType);
This method sets the way the InterceptLogger object will filter the messages according to the function names that were added by the AddFunctionToFilter() method. FilterType can be any one of the following values: o 0 – No filtering will be done, the functions in the filter list will be ignored o 1 – Log if and only if the log message is of a function that exists in the filter list o 2 – Log if and only if the log message is of a function that doesn't exists in the filter list
13
Win32API Interceptor
Final Report
Microsoft Access Data-Base We used Microsoft Access as a repository for the function calls that were issued during the instrumented application's life time. Further more, we used MS Access to create an application that allows the user to easily start an instrumented application and display the results. Since in an MS Access application we can add VB code and since, we were able to use the COM objects described in the section above to achieve this goal. The database was not the main priority of this project so we used a simple database structure, and it included only one table. One clear limitation that this database design imposed was that we could not save all the arguments passed in every function, therefore we limited the number of arguments that were logged to the first five arguments. Besides these fields the table includes fields for the functions' names, their return address a timestamp in which the function was called.
14
Win32API Interceptor
Final Report
The Solution Architecture In this section we'll describe the pieces that build our solution, what technology we used for each of them and how we put all the pieces together. The following diagram displays the grand picture.
1. Create COM Object Win32API Interceptor (MS Access Data Base) 6. Log function call data InterceptLogger (COM Object)
2. Create COM Object
DllInjectionAppLoader (COM Object)
3a. Spawn
(OS Pipe)
\\.\pipe\Win32APIInterceptor
5. Extract logging data
4. Post function call logging data
TraceAPI.DLL (Binary DLL with the Detours
3b. Inject
Spawned process (with the TraceAPI.DLL injected)
functions)
Actually all the elements that are displayed in the diagram above were described throughout the document.
15
Win32API Interceptor
Final Report
Win32API Interceptor This is an MS Access based application that runs the show. As soon as the database is started a form opens, using this form the user can choose which application he wants to intercept, he can change the filter that controls which functions should be logged or not. All the logged data is displayed on the screen and there is a graph that displays the topmost called functions. (See the MS Access sub-section in the Technologies section)
InterceptLogger This is a COM object that incorporates the code that extracts the logging data that is stored in the \\.\pipe\Win32APIInterceptor OS Pipe and inserts it to the data base. (See the COM sub-section in the Technologies section)
DllInjectionAppLoader This, too, is a COM object that spawns the user selected application that will be instrumented with the TraceAPI.DLL detours functions. (See the COM sub-section in the Technologies section)
\\.\pipe\Win32APIInterceptor This is the windows-pipe's name that the TraceAPI.DLL's functions send logging data to and the InterceptLogger extracts the logging data from. This is the way the spawned process communicates with the Win32API-Intercetor application.
TraceAPI.DLL This DLL file includes the detours functions and the code that changes the Win32 API functions so they'd detour first through our code, that will send logging data via the pipe. (See Microsoft research "Detours" technology section)
Spawned Process This is the process the user wants to log its Win32API function calls. It will be spawned by the DllInjectionAppLoader COM object, that will inject the TraceAPI.DLL into its memory space. Every call to a Win32API function that will be issued by this application/process will be posted in the \\.\pipe\Win32APIInterceptor. Thereafter the message will be extracted from the pipe by the InterceptLogger and stored by it in the Win32API Interceptor database.
16
Win32API Interceptor
Final Report
Code snippets The DETOUR_TRAMPOLINE macro This macro was used to declare the trampoline function and to store the original function's address (See the Detouring a function section) The following code was excerpted from the Detours.h file. #define DETOUR_TRAMPOLINE(trampoline,target) \ static PVOID __fastcall _Detours_GetVA_##target(VOID) \ { \ return ⌖ \ } \ \ __declspec(naked) trampoline \ { \ __asm { nop };\ __asm { nop };\ __asm { call _Detours_GetVA_##target };\ __asm { jmp eax };\ __asm { ret };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ __asm { nop };\ }
DetourGenMoveEax function This function generates assembly code on the given pointer that will be interpreted as a "MOV nValue eax" assembly command. There are many other functions of this sort, so we'll bring only this one as a teaser. The following code was excerpted from the Detours.h file. inline PBYTE DetourGenMovEax(PBYTE pbCode, UINT32 nValue) { *pbCode++ = 0xB8; *((UINT32*&)pbCode)++ = nValue; return pbCode;
17
Win32API Interceptor
Final Report
}
DetourFunctionWithTrampolineEx function This function was used to change the original function so it'll jump to the detour function, and to fill the trampoline (See the Detouring a function section) The following code was excerpted from the Detours.cpp file. BOOL WINAPI DetourFunctionWithTrampolineEx(PBYTE PBYTE PBYTE PBYTE { PBYTE pbTarget = NULL;
pbTrampoline, pbDetour, *ppbRealTrampoline, *ppbRealTarget)
// Kfir: Gets the address of the first line of code // (??? maybe there are some // headers inside a function that we sould ignore ???) pbTrampoline = DetourGetFinalCode(pbTrampoline, TRUE); pbDetour = DetourGetFinalCode(pbDetour, FALSE); // Kfir: Set Return values if (ppbRealTrampoline) *ppbRealTrampoline = pbTrampoline; if (ppbRealTarget) *ppbRealTarget = NULL; if (pbTrampoline == NULL || pbDetour == NULL) return FALSE; // Kfir: check the trampoline that was passed has the // structure we expect // this tramopline should have been constructed using // the "DETOUR_TRAMPOLINE" macro // (located in the Detours.h file) if (pbTrampoline[0] != OP_NOP || pbTrampoline[1] != OP_NOP || pbTrampoline[2] != OP_CALL || pbTrampoline[7] != OP_PREFIX || pbTrampoline[8] != OP_JMP_EAX) { return FALSE; } PVOID (__fastcall * pfAddr)(VOID); // Kfir: // // // // // // // pfAddr =
calculate MAGICALLY the location of the "original function".... pbTrampoline is a pointer to a Trampoline function code (see DETOUR_TRAMPOLINE macro) pbTrampoline[3] whould be the adderss of the precompiler-generated function named "_Detours_GetVA_##target" (see DETOUR_TRAMPOLINE macro) (PVOID (__fastcall *)(VOID))(pbTrampoline + SIZE_OF_NOP + SIZE_OF_NOP + SIZE_OF_JMP + *(LONG *)&pbTrampoline[3]);
pbTarget = DetourGetFinalCode((PBYTE)(*pfAddr)(), FALSE); if (ppbRealTarget)
18
Win32API Interceptor
Final Report
*ppbRealTarget = pbTarget; // Kfir: this function will copy the code from the original // code to the trampoline and add // the needed jmp opcodes in the trampoline and in the // original function // pbTarget is the pointer to the ORIGINAL FUNCTION // pbDetour is the pointer to the DETOUR FUNCTION (the // code you want to run before the original one) // pbTrampoline is the pointer to where we will dump the // first 5 bytes of the original code. return detour_insert_detour(pbTarget, pbTrampoline, pbDetour); }
detour_insert_detour function This is an inner function that is used by the DetourFunctionWithTrampolineEx function. (See where this function was called, at the end of the function) The following code was excerpted from the Detours.cpp file. // Kfir: this function will copy the code from the original-code to // the trampoline and add // the needed jmp opcodes in the trampoline and in the original // function // pbTarget is the pointer to the ORIGINAL FUNCTION // pbDetour is the pointer to the DETOUR FUNCTION (the code you // want to run before the original one) // pbTrampoline is the pointer to where we will dump the first // 5 bytes o static BOOL detour_insert_detour(PBYTE pbTarget, PBYTE pbTrampoline, PBYTE pbDetour) { PBYTE pbCont = pbTarget; //Kfir: First we want to *check* what kind of commands exist in // the begining of a function // generally we want to remove at least 5 bytes // (SIZE_OF_TRP_OPS) but we // dont want to break a command in the middle (i don't // know how they drew the line exactly // but some opcodes need to be glued and some don't need to be // glued together, // so if the first one moves so will the rest) for (LONG cbTarget = 0; cbTarget < SIZE_OF_TRP_OPS;) { PBYTE pbOp = pbCont; BYTE bOp = *pbOp; pbCont = DetourCopyInstruction(NULL, pbCont, NULL); cbTarget = pbCont - pbTarget; if (bOp bOp bOp bOp
== == == ==
OP_JMP || OP_JMP_EAX || OP_RET_POP || OP_RET) {
break; } if (bOp == OP_PREFIX && pbOp[1] == OP_JMP_SEG) { break; } if ((bOp == OP_PRE_ES || bOp == OP_PRE_CS ||
19
Win32API Interceptor bOp == bOp == bOp == bOp == pbOp[1] pbOp[2] break;
Final Report OP_PRE_SS || OP_PRE_DS || OP_PRE_FS || OP_PRE_GS) && == OP_PREFIX && == OP_JMP_SEG) {
} } // Kfir: End of FOR!!! if (cbTarget < SIZE_OF_TRP_OPS) { // Too few instructions. return FALSE; } if (cbTarget > (DETOUR_TRAMPOLINE_SIZE - SIZE_OF_JMP - 1)) { // Too many instructions. return FALSE; } CDetourEnableWriteOnCodePage ewTrampoline(pbTrampoline, DETOUR_TRAMPOLINE_SIZE); CDetourEnableWriteOnCodePage ewTarget(pbTarget, cbTarget); if (!ewTrampoline.SetPermission(PAGE_EXECUTE_READWRITE)) return FALSE; if (!ewTarget.IsValid()) return FALSE; PBYTE pbSrc = pbTarget; PBYTE pbDst = pbTrampoline; // Kfir: Now really *move* the code we discovered in the // for-loop before. for (LONG cbCopy = 0; cbCopy < cbTarget;) { pbSrc = DetourCopyInstruction(pbDst, pbSrc, NULL); cbCopy = pbSrc - pbTarget; pbDst = pbTrampoline + cbCopy; } if (cbCopy != cbTarget) // Count came out different! return FALSE; // Kfir: add a jump in the Trampoline to the rest of the // original code // (after copying the first 5Bytes we now add the jump) if (!detour_insert_jump(pbDst, pbTarget + cbTarget, SIZE_OF_JMP)) return FALSE; pbTrampoline[DETOUR_TRAMPOLINE_SIZE-1] = (BYTE)cbTarget; // Kfir: add a jump in the Original Function code to the DETOUR // FUNCTION // (after "backuping" the first 5Bytes of the original // code in the trampoline we can now // override it with a jump to the "injected code") if (!detour_insert_jump(pbTarget, pbDetour, cbTarget)) return FALSE; return TRUE; }
20
Win32API Interceptor
Final Report
Creating a trampoline function for the "CreateProcessW" function This is a call to the DETOUR_TRAMPOLINE c-macro that creates a trampoline function for the CreateProcessW Win32API function. Note hoe the signature of the tramoline function is the same as the original CreateProcessW function. The following code was excerpted from the _win32.cpp file. DETOUR_TRAMPOLINE(BOOL __stdcall Real_CreateProcessW(LPCWSTR a0, LPWSTR a1, LPSECURITY_ATTRIBUTES a2, LPSECURITY_ATTRIBUTES a3, BOOL a4, DWORD a5, LPVOID a6, LPCWSTR a7, struct _STARTUPINFOW* a8, LPPROCESS_INFORMATION a9), CreateProcessW);
Instrumenting the "CreateProcessW" function This call to the DetourFunctionWithTrampoline function was taken out of a function that is run in the DLLMain of the TraceAPI.dll. This call will instrument the Original CreateProcessW using the Detour function (Mine_CreateProcess) and the Trampoline function (Real_CreateProcecssW) The following code was excerpted from the _win32.cpp file. DetourFunctionWithTrampoline((PBYTE)Real_CreateProcessW, (PBYTE)Mine_CreateProcessW);
The "CreateProcessW" Detour function This is the Detour function that will be run istead of the original "CreatePrrocessW" function. Notice that it calls the trampoline function. The following code was excerpted from the _win32.cpp file. BOOL __stdcall Mine_CreateProcessW(LPCWSTR a0, LPWSTR a1, LPSECURITY_ATTRIBUTES a2, LPSECURITY_ATTRIBUTES a3, BOOL a4, DWORD a5, LPVOID a6, LPCWSTR a7, struct _STARTUPINFOW* a8, LPPROCESS_INFORMATION a9) { // Kfir: log the fact that this function was called and its // paramters' values DWORD nMsgID = _PrintEnter(-1, "CreateProcessW(%ls,%ls,%lx,%lx,%lx,%lx,%lx,%ls,%lx,%lx)\n", a0, a1, a2, a3, a4, a5, a6, a7, a8, a9); BOOL rv = 0; __try { // Kfir: Here we call the trampoline function that will
21
Win32API Interceptor
Final Report
// eventually call jump to the rest of the original code rv = Real_CreateProcessW(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9); } __finally { // Kfir: Log the return value _PrintExit(nMsgID, "CreateProcessW(,,,,,,,,,)>%lx\n", rv); }; return rv; }
InjectLibrary function This function injects a LoadLibrary() function call to a process. It is used to inject the TraceAPI.dll file to the process that the user asked to intercept its Win32API calls. (See Loading a DLL into a process’s context section for more details) The following code was excerpted from the creatwth.cpp file. static BOOL InjectLibrary(HANDLE hProcess, HANDLE hThread, PBYTE pfLoadLibrary, PBYTE pbData, DWORD cbData) { BOOL fSucceeded = FALSE; DWORD nProtect = 0; DWORD nWritten = 0; CONTEXT cxt; UINT32 nCodeBase; PBYTE pbCode; struct Code { BYTE rbCode[128]; union { WCHAR wzLibFile[512]; CHAR szLibFile[512]; }; } code; //Kfir: suspend the tread so we can change its context & stack //Kfir: hThread is the main running thread SuspendThread(hThread); ZeroMemory(&cxt, sizeof(cxt)); cxt.ContextFlags = CONTEXT_FULL; //Kfir: Get Thread's context if (!GetThreadContext(hThread, &cxt)) { goto finish; }
//Kfir: calculate where the new code should be inserted from nCodeBase = (cxt.Esp - sizeof(code)) & ~0x1fu;// Cache-line align. pbCode = code.rbCode;
22
Win32API Interceptor
Final Report
if (pbData) { CopyMemory(code.szLibFile, pbData, cbData); //Kfir: probably "DetourGenPush" adds a "push" opcode on // the stack and the //Kfir: address of the Dll Name afterwards pbCode = DetourGenPush(pbCode, nCodeBase + offsetof(Code, szLibFile)); //Kfir: probably adds a "call" opcode to the // 'LoadLibrary' function located at //Kfir: the kernel dll (uses internal GetLoadLibraryA()) pbCode = DetourGenCall(pbCode, pfLoadLibrary, (PBYTE)nCodeBase + (pbCode – code.rbCode)); } //Kfir: probably adds opdodes that restores the current // registers' values pbCode = DetourGenMovEax(pbCode, cxt.Eax); pbCode = DetourGenMovEbx(pbCode, cxt.Ebx); pbCode = DetourGenMovEcx(pbCode, cxt.Ecx); pbCode = DetourGenMovEdx(pbCode, cxt.Edx); pbCode = DetourGenMovEsi(pbCode, cxt.Esi); pbCode = DetourGenMovEdi(pbCode, cxt.Edi); pbCode = DetourGenMovEbp(pbCode, cxt.Ebp); pbCode = DetourGenMovEsp(pbCode, cxt.Esp); //Kfir: continue running what we suspended pbCode = DetourGenJmp(pbCode, (PBYTE)cxt.Eip, (PBYTE)nCodeBase + (pbCode - code.rbCode)); //Kfir: update the next command and SP cxt.Esp = nCodeBase - 4; cxt.Eip = nCodeBase; //Kfir: add writting permmisions to the process's page where // the code should be injected (nCodeBase) if (!VirtualProtectEx(hProcess, (PBYTE)nCodeBase, sizeof(Code), PAGE_EXECUTE_READWRITE, &nProtect)) { goto finish; } //Kfir: add the injected code formed above if (!WriteProcessMemory(hProcess, (PBYTE)nCodeBase, &code, sizeof(Code), &nWritten)) { goto finish; } //Kfir: when writing to stack new comands must call this // function (see msdn) if (!FlushInstructionCache(hProcess, (PBYTE)nCodeBase, sizeof(Code))) { goto finish; } //Kfir: update thread's context if (!SetThreadContext(hThread, &cxt)) { goto finish; } fSucceeded = TRUE; finish:
23
Win32API Interceptor
Final Report
//Kfir: Resmue the thread //Kfir: This will cause the LoadLibrary function to run and // load the wanted dll //Kfir: and to restore the registers //Kfir: and finally to jup back to where we suspened the thread // in the first place ResumeThread(hThread); return fSucceeded; }
24
Win32API Interceptor
Final Report
The Application Guide Win32APIInterceptor installation
System Requirements 1. 2. 3. 4.
x86 version of Windows NT/Windows 2000/ Windows XP Microsoft Visual C++ .Net 2003 Microsoft Office Access 2003 Administrator permissions
Adding an ODBC DSN for the project ODBC Data Source should be defined, prior to the installation. In order to do that, go to the Start button and select Control Panel (as shown on the illustration 4.1).
illustration 4.1
illustration 4.2
Now, open ODBC Data Source Administrator and select User DSN section (illustration 4.2). Then, press the Add button, select Driver do Microsoft Access (*.mdb) and press the Finish button. Now, you see the ODBC Microsoft Access Setup screen. Type Win32APIInterceptor in the Data Source Name field, press the Select button and select the Win32APIInterceprot.mdb file from the Win32APIInterceptor directory. Press OK. Once ODBC definition is complete, you must compile the source files. These files can be found on the project’s website.
Compilation In order to install the application, open the Win32APIInterceptor -> Win32APIIntercept directory. Open Win32APIIntercept.sln file with Microsoft Visual C++ .NET
25
Win32API Interceptor
Final Report
illustration 4.3 The next step is to build the solution. Select the Rebuild Solution option from the Build menu as shown on the previous illustration and wait until the build process completes. Now, Win32APIInterceptor is ready for use.
26
Win32API Interceptor
Final Report
GUI guide
Main Screen To start a Win32APIInterceptor, open the Win32APIInterceptor.mdb file. The main window appears, as follows:
illustration 4.4
Intercepted Executable section In the Executable location field, the user specifies the application he wishes to intercept. One can simply type in the path of the desired executable, or hit the … button, to browse the Disk. By pressing the Start Interception button, the user starts the interception process and by pressing the Stop Interception button, halts it.
Illustration 4.5
27
Win32API Interceptor
Final Report
Log section All intercepted API calls for a certain application are listed in this section. There is a separate record for each API function, which consists of a function name, call time, return value and the first five arguments (at most). While intercepting, all these details are saved in a database. In order to display them on the screen, one must press the Refresh button. There is a record counter at the bottom of the section.
illustration 4.6
Tools section The can change some interception properties:
To clear the database, use the Clear Database button. After the database is cleared, the Log list is refreshed – all the data is removed from it.
There is an option to refresh the Log list automatically. In order to do that, check the AutoRefresh checkbox. The next step is to enter the timeout in seconds.
In the Filters section the user can manage the Log list. One can select the functions he wishes to display (or those he wishes to screen out). The default condition is Unfiltered, which means, that all API functions will be displayed. In order to display only certain functions, one should select the Wanted option and add all of the desired functions to the list below. To block some function from being displayed, select the Unwanted option (analogous to Wanted in the previous example) and proceed as described earlier. Functions list can be managed according to the user’s needs (add/remove functions, clear the list). All changes in the tools section can be performed dynamically, without stopping the interception process.
illustration 4.7
28
Win32API Interceptor
Final Report
Top Functions section This chart allows the user to see most frequently used functions for a certain application.
illustration 4.8
29
Win32API Interceptor
Final Report
Hands-on example Let’s examine the API calls that occur while running the command line interpreter (cmd.exe, in our case). 1. Open Win32APIInterceptor and press the Clear Database button for you own convenience. (This will delete the saved information from the database and remove all the records from the Log section.) 2. Type cmd in the Executable location field, and then press the Start Interception button. 3. As you can see, nothing happens and the Log section remains clear. The reason is that we didn’t press the Refresh button, i.e. the information about API calls is stored in the database, but is not shown on the screen. Press the Refresh button and function’s details will appear. Wait a few second until all API calls occur. (You can press the Refresh button at the end of the interception, to ensure, that all functions are displayed.) 4. Press the Stop Interception button. 5. The counter on the bottom of the Log section shows, that 282 API calls has occurred. We can see that most frequently called function was GetLocaleInfoW (138 times - according to the chart). 6. Clear the function list, by pressing the Clear button in the Filters section. Add the GetLocaleInfoW function to the list and highlight the Unwanted option. 7. This will cause the interceptor not to display the GetLocaleInfoW function in the Log section. 8. Repeat the steps 1-3 again. 9. The GetLocaleInfoW function doesn’t appear in the Log section, and function counter equals to 144. As you can see, 144 + 138 = 282. 10. Note, that instead of using the Refresh button, you can use the AutoRefresh option.
30
Win32API Interceptor
Final Report
Known Issues •
The compilation of the project is dependent of the MS Visual Studio 2003 (.Net) IDE, there isn't a full makefile that build all of the project. Aside of the COM objects, there is a makefile that'll build the rest of the project.
31
Win32API Interceptor
Final Report
Appendices Microsoft Research “Detours” Abstract
extensions by rebuilding the OS or application. However, in today’s world of commercial development and binary-only releases, researchers seldom have access to all relevant source code. Detours is a library for intercepting arbitrary Win32 binary functions on x86 machines. Interception code is applied dynamically at runtime. Detours replaces the first few instructions of the target function with an unconditional jump to the userprovided detour function. Instructions from the target function are preserved in a trampoline function. The trampoline function consists of the instructions removed from the target function and an unconditional branch to the remainder of the target function. The detour function can either replace the target function or extend its semantics by invoking the target function as a subroutine through the trampoline. Detours are inserted at execution time. The code of the target function is modified in memory, not on disk, thus facilitating interception of binary functions at a very fine granularity. For example, the procedures in a DLL can be detoured in one execution of an application, while the original procedures are not detoured in another execution running at the same time. Unlike DLL re-linking or static redirection, the interception techniques used in the Detours library are guaranteed to work regardless of the method used by application or system code to locate the target function. While others have used binary rewriting for debugging and to inline instrumentation, Detours is a general-purpose package. To our knowledge, Detours is the first package on any platform to logically preserve the uninstrumented target function as a subroutine callable through the trampoline. Prior systems logically prepended the instrumentation to the target, but did not make the original target’s functionality available as a general subroutine. Our unique trampoline design is crucial for extending existing binary software.
Innovative systems research hinges on the ability to easily instrument and extend existing operating system and application functionality. With access to appropriate source code, it is often trivial to insert new instrumentation or extensions by rebuilding the OS or application. However, in today’s world of commercial software, researchers seldom have access to all relevant source code. We present Detours, a library for instrumenting arbitrary Win32 functions on x86 machines. Detours intercepts Win32 functions by re-writing target function images. The Detours package also contains utilities to attach arbitrary DLLs and data segments (called payloads) to any Win32 binary. While prior researchers have used binary rewriting to insert debugging and profiling instrumentation, to our knowledge, Detours is the first package on any platform to logically preserve the un-instrumented target function (callable through a trampoline) as a subroutine for use by the instrumentation. Our unique trampoline design is crucial for extending existing binary software. We describe our experiences using Detours to create an automatic distributed partitioning system, to instrument and analyze the DCOM protocol stack, and to create a thunking layer for a COM-based OS API. Micro-benchmarks demonstrate the efficiency of the Detours library.
Introduction Innovative systems research hinges on the ability to easily instrument and extend existing operating system and application functionality whether in an application, a library, or the operating system DLLs. Typical reasons to intercept functions are to add functionality, modify returned results, or insert instrumentation for debugging or profiling. With access to appropriate source code, it is often trivial to insert new instrumentation or
32
Win32API Interceptor
Final Report function completes, it returns control to the detour function. The detour function performs appropriate postprocessing and returns control to the source function. Figure 1 shows the logical flow of control for function invocation with and without interception.
In addition to basic detour functionality, Detours also includes functions to edit the DLL import table of any binary, to attach arbitrary data segments to existing binaries, and to inject a DLL into either a new or an existing process. Once injected into a process, the instrumentation DLL can detour any Win32 function, whether in the application or the system libraries. The following section describes how Detours works. Section 0 outlines the usage of the Detours library. Section 0 describes alternative function-interception techniques and presents a micro-benchmark evaluation of Detours. Section 0 details the usage of Detours to produce distributed applications from local applications, to quantify DCOM overheads, to create a thunking layer for a new COM-based Win32 API, and to implement first chance exception handling. We compare Detours with related work in Section 0 and summarize our contributions in Section 0.
Invocation without interception:
1 Source Function
Target Function
2 Invocation with interception:
1 Source Function
2 Detour Function
5 Figure 1. interception.
Implementation Detours provides three important sets of functionality: the ability to intercept arbitrary Win32 binary functions on x86 machines, the ability to edit the import tables of binary files, and the ability to attach arbitrary data segments to binary files. We will describe the implementation of each of these functionalities.
3 Trampoline Function
Target Function
4
Invocation with and without
The Detours library intercepts target functions by rewriting their in-process binary image. For each target function, Detours actual rewrites two functions: the target function and the matching trampoline function. The trampoline function can be allocated either dynamically or statically. A statically allocated trampoline always invokes the target function without the detour. Prior to insertion of a detour, the static trampoline contains a single jump to the target. After insertion, the trampoline contains the initial instructions from the target function and a jump to the remainder of the target function. Statically allocated trampolines are extremely useful for instrumentation programmers. For example, in Coign [7], invoking the Coign_CoCreateInstance trampoline is equivalent to invoking the original CoCreateInstance function without instrumentation. Coign internal functions can call Coign_CoCreateInstance at any time to create a new component instance without concern for whether or not the original function has been rerouted with a detour.
Interception of Binary Functions The Detours library facilitates the interception of function calls. Interception code is applied dynamically at runtime. Detours replaces the first few instructions of the target function with an unconditional jump to the user-provided detour function. Instructions from the target function are preserved in a trampoline function. The trampoline consists of the instructions removed from the target function and an unconditional branch to the remainder of the target function. When execution reaches the target function, control jumps directly to the user-supplied detour function. The detour function performs whatever interception preprocessing is appropriate. The detour function can return control to the source function or it can call the trampoline function, which invokes the target function without interception. When the target
33
Win32API Interceptor
Final Report
;; Target Function … TargetFunction: push ebp mov ebp,esp push ebx push esi push edi …
;; Target Function … TargetFunction: jmp DetourFunction
;; Trampoline … TrampolineFunction: jmp TargetFunction …
;; Trampoline … TrampolineFunction: push ebp mov ebp,esp push ebx push esi jmp TargetFunction+5 …
TargetFunction+5: push edi …
Figure 2. Trampoline and target functions, before and after insertion of the detour (left and right).
payloads, to Win32 binary files and for editing DLL import tables. Figure 3 shows the basic structure of a Win32 Portable Executable (PE) binary file. The PE format for Win32 binaries is an extension of COFF (the Common Object File Format). A Win32 binary consists of a DOS compatible header, a PE header, a text section containing program code, a data section containing initialized data, an import table listing any imported DLLS and functions, an export table listing functions exported by the code, and debug symbols. With the exception of the two headers, each of the other sections of the file is optional and may not exist in a given binary. Start of File
Figure 2 shows the insertion of a detour. To detour a target function, Detours first allocates memory for the dynamic trampoline function (if no static trampoline is provided) and then enables write access to both the target and the trampoline. Starting with the first instruction, Detours copies instructions from the target to the trampoline until at least 5 bytes have been copied (enough for an unconditional jump instruction). If the target function is fewer than 5 bytes, Detours aborts and returns an error code. To copy instructions, Detours uses a simple table-driven disassembler. Detours adds a jump instruction from the end of the trampoline to the first non-copied instruction of the target function. Detours writes an unconditional jump instruction to the detour function as the first instruction of the target function. To finish, Detours restores the original page permissions on both the target and trampoline functions and flushes the CPU instruction cache with a call to FlushInstructionCache.
DOS Header PE (w/COFF) Header .text Section Program Code .data Section Initialized Data .idata Section Import Table .edata Section Export Table Debug Symbols
End of File
Figure 3. Format of a Win32 PE binary file. To modify a Win32 binary, Detours creates a new .detours section between the export table and the debug symbols. Note that debug symbols must always reside last in a Win32 binary. The new section contains a detours header record and a copy of the original PE header. If modifying the import table, Detours creates the new import table, appends it to the copied PE header, then modifies the original PE header to point to the new import table. Finally, Detours writes any user payloads at the end of the .detours section and appends the debug symbols to finish the file. Detours can reverse modifications to the Win32 binary by restoring the original PE header from the .detours section and removing the .detours section. Figure 4 shows the format of a Detours-modified Win32 binary. Creating a new import table serves two purposes. First, it preserves the original import table in case the programmer needs to reverse all modifications to the Win32 file.
Payloads and DLL Import Editing While a number of tools exist for editing binary files [10, 12, 13, 17], most systems research doesn’t require such heavy-handed access to binary files. Instead, it is often sufficient to add an extra DLL or data segment to an application or system binary file. In addition to detour functions, the Detours library also contains fully reversible support for attaching arbitrary data segments, called
34
Win32API Interceptor
Final Report
Second, the new import table can contain renamed import DLLs and functions or entirely new DLLs and functions. For example, Coign [7] uses Detours to insert an initial entry for coignrte.dll into each instrumented application. As the first entry in the applications import table, coignrte.dll always is the first DLL to run in the application’s address space. Start of File
must include the detours.h header file and link with the detours.lib library. #include <windows.h> #include <detours.h> VOID (*DynamicTrampoline)(VOID) = NULL; DETOUR_TRAMPOLINE( VOID WINAPI SleepTrampoline(DWORD), Sleep ); VOID WINAPI SleepDetour(DWORD dw) { return SleepTrampoline(dw); }
DOS Header PE (w/COFF) Header .text Section Program Code
VOID DynamicDetour(VOID) { return DynamicTrampoline(); }
.data Section Initialized Data
void main(void) { VOID (*DynamicTarget)(VOID) = SomeFunction;
.idata Section unused Import Table
DynamicTrampoline =(FUNCPTR)DetourFunction( (PBYTE)DynamicTarget, (PBYTE)DynamicDetour);
.edata Section Export Table .detours Section detour header original PE header new import table user payloads
DetourFunctionWithTrampoline( (PBYTE)SleepTrampoline, (PBYTE)SleepDetour); // Execute the remainder of program.
Debug Symbols
DetourRemoveTrampoline(SleepTrampoline); DetourRemoveTrampoline(DynamicTrampoline);
End of File
Figure 4. binary file.
}
Format of a Detours-modified
Figure 5. Sample Instrumentation Program. Detours provides functions for editing import tables, adding payloads, enumerating payloads, removing payloads, and rebinding binary files. Detours also provides routines for enumerating the binary files mapped into an address space and locating payloads within those mapped binaries. Each payload is identified by a 128-bit globally unique identifier (GUID). Coign uses Detours to attach per-application configuration data to application binaries. In cases where instrumentation need be inserted into an application without modifying binary files, Detours provides functions to inject a DLL into either a new or an existing process. To inject a DLL, Detours writes a LoadLibrary call into the target process with the VirtualAllocEx and WriteProcessMemory APIs then invokes the call with the CreateRemoteThread API.
Trampolines may be created either statically or dynamically. To intercept a target function with a static trampoline, the application must create the trampoline with the DETOURmacro. DETOUR__TRAMPOLINE TRAMPOLINE takes two arguments: the prototype for the static trampoline and the name of the target function. Note that for proper interception the prototype, target, trampoline, and detour functions must all have exactly the same call signature including number of arguments and calling convention. It is the responsibility of the detour function to copy arguments when invoking the target function through the trampoline. This is intuitive as the target function is just a subroutine callable by the detour function. Using the same calling convention insures that registers will be properly preserved and that the stack will be properly aligned between detour and target functions.
Using Detours The code fragment in Figure 5 illustrates the usage of the Detours library. User code
35
Win32API Interceptor
Final Report
Interception of the target function is enabled by invoking the DetourFunctionWithTrampoline function with two arguments: the trampoline and the pointer to the detour function. The target function is not given as an argument because it is already encoded in the trampoline. A dynamic trampoline is created by calling DetourFunction with two arguments: a pointer to the target function and a pointer to the detour function. DetourFunction allocates a new trampoline and inserts the appropriate interception code in the target function. Static trampolines are extremely easy to use when the target function is available as a link symbol. When the target function is not available for linking, a dynamic trampoline can be used. Often a function pointer to the target function can be acquired from a second function. For those times, when a pointer to the target function is not readily available, DetourFindFunction can find the pointer to a function when it is either exported from a known DLL, or if debugging symbols are available for the target function’s binary1. DetourFindFunction accepts two arguments, the name of the binary and the name of the function. DetourFindFunction returns either a valid pointer to the function or NULL if the symbol for the function could not be found. DetourFindFunction first attempts to locate the function using the Win32 LoadLibrary and GetProcAddress APIs. If the function is not found in the export table of the DLL, DetourFindFunction uses the ImageHlp library to search available
debugging symbols. The function pointer returned by DetourFindFunction can be given to DetourFunction to create a dynamic trampoline. Interception of a target function can be removed by invoking the DetourRemoveTrampoline function. Note that because the functions in the Detours library modify code in the application address space, it is the programmer’s responsibility to ensure that no other threads are executing in the address space while a detour is inserted or removed. An easy way to insure single-threaded execution is to call functions in the Detours library from a DllMain routine.
Evaluation Several alternative techniques exist for intercepting function calls. Alternative interception techniques include: Call replacement in application source code. Calls to the target function are replaced with calls to the detour function by modifying application source code. The major drawback of this technique is that it requires access to source code. Call replacement in application binary code. Calls to the target function are replaced with calls to the detour function by modifying application binaries. While this technique does not require source code, replacement in the application binary does require the ability to identify all applicable call sites. This requires substantial symbolic information that is not generally available for binary software. DLL redirection. If the target function resides in a DLL, the DLL import entries in the binary can be modified to point to a detour DLL. Redirection to the detour DLL can be achieved by either replacing the name of the original DLL in the import table before load time or replacing the function addresses in the indirect import jump table after load [2]. Unfortunately, redirecting to the detour DLL through the import table fails to intercept DLL internal calls and calls on pointers obtained from the LoadLibrary and GetProcAddress APIs early in an applications execution. Breakpoint trapping. Rather than replace the DLL, the target function can be intercepted by inserting a debugging breakpoint into the target function. The debugging exception
1
Microsoft ships debugging symbols for the entire Windows NT operation system as part of the retail release. These symbols can be found in the \support\symbols directory on the OS distribution media.
36
Win32API Interceptor
Final Report be partitioned across a network. During distributed executions, new Coign detour functions intercept calls to COM instantiation functions and re-route those calls to distributed machines. In essence, Coign extends the COM library to support intelligent remote invocation. Whereas DCOM supports remote invocation of a few COM instantiation functions, Coign supports remote invocation for approximately 50 COM functions through detour extensions. Coign uses Detours’ DLL redirection functions to attach a runtime loader and the payload functions to attach profiling data to application binaries. Our colleagues have used Detours to instrument the user-mode portion of the DCOM protocol stack including marshaling proxies, DCOM runtime, RPC runtime, WinSock runtime, and marshaling stubs [11]. The resultant detailed analysis was then used to drive a re-architecture of DCOM for fast user-mode networks. While they could have used source code modifications to produce a special profiling version of DCOM, the source-based instrumentation would have been version dependent and shared by all DCOM applications on the profiling machine. With binary instrumentation based on Detours, the profiling tool can be attached to any Windows NT 4 build of DCOM and only effects the process being profiled. In another extension exercise, Detours was used to create a thunking layer for COP (the Component-based Operating System Proxy) [14]. COP is a COM-based version of the Win32 API. COP aware applications access operating system functionality through COM interfaces, such as IWin32FileHandle. Because the COP interfaces are distributable with DCOM, a COP application can use OS resources, including file systems, keyboards, mice, displays, registries, etc., from any machine in a network. To provide support for legacy applications, COP uses detour functions to intercept all application calls to the Win32 APIs. Native application API calls are converted to calls on COP interfaces. At the bottom, the COP implementation communicates with the underlying operating system through trampoline functions. COP requires no modifications to application binaries. At load time, the COP DLL is injected into the application’s address space with Detours’ injection functions. Through its
handler can then invoke the detour function. The major drawback to breakpoint trapping is that debugging exceptions suspend all application threads. In addition, the debug exception must be caught in a second operating-system process. Interception via break-point trapping has a high performance penalty. Table 1 lists times for intercepting either an empty function or the CoCreateInstance API. Times are on a 200 MHz Pentium Pro. Rows list the time to invoke the functions without interception, with interception through call replacement, with interception through DLL redirection, with interception using the Detours library, or with interception through breakpoint trapping. As can be seen, function interception with Detours library has only minimal overhead (less than 400 ns in either case). Interception Technique Direct Call Replacement DLL Redirection Detours Library Breakpoint Trap
Table 1. Techniques.
Intercepted Function CoCreateEmpty Instance Function 0.113µs 14.836µs 0.143µs 15.193µs 0.143µs 15.193µs 0.145µs 15.194µs 229.564µs 265.851µs
Comparison
of
Interception
Experience The Detours package has been used extensively in Microsoft Research over the last two years to instrument and extend Win32 applications and the Windows NT operating system. Detours was originally developed for the Coign Automatic Distributed Partition System [7]. Coign converts local desktop applications built from COM components into distributed client-server applications. During profiling, Coign uses Detours to intercept calls to COM instantiation functions such as CoCreateInstance. The detour functions invoke the original library functions through trampolines, then wrap output interface pointers in an additional instrumentation layer (for more details see [8]). The instrumentation layer measures inter-component communication to determine how application components should
37
Win32API Interceptor
Final Report 15]. Code patching has been applied to insert debugging or profiling code. In the distant past, code patching was generally considered to be a much more practical update method than re-compiling the entire application. In addition to debugging and profiling, Detours has also been used to resourcefully extend the functionality of existing systems [7, 14]. While recent systems have extended code patching to parallel applications [1] and system kernels [16], Detours is to our knowledge the only code patching system that preserves the semantics of the target function as a callable subroutine. The detour function replaces the target function, but can invoke its functionality at any point through the trampoline. Our unique trampoline design makes it trivial to extend the functionality of existing binary functions. Recent research has produced a class of detailed binary rewriting tools including Atom [13], Etch [12], EEL [10], and Morph [17]. In general, these tools take as input an application binary and an instrumentation script. The instrumentation script passes over the binary inserting code between instructions, basic blocks, or functions. The output of the script is a new, instrumented binary. In a departure for earlier systems, DyninstAPI [6] can modify applications dynamically. Detours’ primary advantage over detailed binary rewriters is its size. Detours adds less than 18KB to an instrumentation package whereas detailed binary rewriters add at least a few hundred KB. The cost of Detours small size is an inability to insert code between instructions or basic blocks. Detailed binary rewriters can insert instrumentation around any instruction through sophisticated features such as free register discovery. Detours relies on adherence to calling conventions in order to preserve register values. While detailed binary rewriters support insertion of code before or after any basic instruction unit, they do not preserve the semantics of the uninstrumented target function as a callable subroutine.
simple interception, Detours has facilitated this massive extension of the Win32 API. Finally, to support Software Distributed Shared Memory (SDSM) systems, we have implemented a first chance exception filter for Win32 structured exception handling. The Win32 API contains an API, SetUnhandledExceptionFilter, through which an application can specify an exception filter to execute should no other filter handle an application exception. For applications such as SDSM systems, the programmer would like to insert a first-chance exception filter to remove page faults caused by the SDSM’s manipulation of VM page permissions. Windows NT does not provide such a first-chance exception filter mechanism. A simple detour intercepts the exception entry point from kernel mode to user mode (KiUserExceptionDispatcher). With only a few lines of code, the detour function calls a user-provided first-chance exception filter and then forwards the exception, if unhandled, to the default exception mechanism through a trampoline.
Related Work Detours are an extension of the general technique of code patching. To intercept execution, an unconditional branch or jump is inserted into the desired point of interception in the target function. Code overwritten by the unconditional branch is moved to a code patch. The code patch consists of either the instrumentation code or a call to the instrumentation code followed by the instructions moved to insert the unconditional branch and a jump to the first instruction in the target function after the unconditional branch. Logically, a code patch can be prepended to the beginning of a function, inserted at some arbitrary point in a function, or appended to the end of a function. Whereas a code patch invokes instrumentation then continues the target function, our technique transfers control completely to the detour function which can invoke the original target function through the trampoline at its leisure. The trampoline gives instrumentation complete freedom to invoke the semantics of the original function as a callable subroutine at any time. Techniques for code patching have existed since the dawn of digital computing [3-5, 9,
Conclusions The Detours library provides an import set of tools to the arsenal of the systems researcher. Detour functions are fast, flexible, and friendly. A detour of CoCreateInstance function has less than a 3% overhead, which is an order of magnitude
38
Win32API Interceptor
Final Report
smaller than the penalty for breakpoint trapping. The Detours library is very small. The runtime consists of less than 40KB of compiled code although typically less than 18KB of code is added to the users instrumentation. Unlike DLL redirection, the Detours library intercepts both statically and dynamically bound invocations. Finally, the Detours library is much more flexible than DLL redirection or application code modification. Interception of any function can be selectively enabled or disabled for each process individually at execution time. Our unique trampoline preserves the semantics of the original, uninstrumented target function for use as a subroutine of the detour function. Using detour functions and trampolines, it is trivial to produce compelling system extensions without access to system source code and without recompiling the underlying binary files. Detours makes possible a whole new generation of innovative systems research on the Windows NT platform.
[9]
Kessler, Peter. Fast Breakpoints: Design and Implementation. Proceedings of the ACM SIGPLAN '90 Conference on Programming Language Design and Implementation, pp. 78-84. White Plains, NY, June 1990.
[10]
Larus, James R. and Eric Schnarr. EEL: MachineIndependent Executable Editing. Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 291-300. La Jolla, CA, June 1995.
[11]
Li, Li, Alessandro Forin, Galen Hunt, and Yi-Min Wang. High-Performance Distributed Objects over a System Area Network. Proceedings of the Third USENIX NT Symposium. Seattle, WA, July 1999.
[12]
Romer, Ted, Geoff Voelker, Dennis Lee, Alec Wolman, Wayne Wong, Hank Levy, Brian Bershad, and J. Bradley Chen. Instrumentation and Optimization of Win32/Intel Executables Using Etch. Proceedings of the USENIX Windows NT Workshop 1997, pp. 1-7. Seattle, WA, August 1997. USENIX.
[13]
Srivastava, Amitabh and Alan Eustace. ATOM: A System for Building Customized Program Analysis Tools. Proceedings of the SIGPLAN '94 Conference on Programming Language Design and Implementation, pp. 196-205. Orlando, FL, June 1994.
[14]
Stets, Robert J., Galen C. Hunt, and Michael L. Scott. Component-based Operating System APIs: A Versioning and Distributed Resource Solution. IEEE Computer, 32(7), July 1999.
[15]
Stockham, T.G. and J.B. Dennis. FLIT- Flexowriter Interrogation Tape: A Symbolic Utility Program for the TX-0. Department of Electical Engineering, MIT, Cambridge, MA, Memo 5001-23, July 1960.
[16]
Tamches, Ariel and Barton P. Miller. Fine-Grained Dynamic Instrumentation of Commodity Operating System Kernels. Proceedings of the Third Symposium on Operating Systems Design and Implementation (OSDI '99), pp. 117-130. New Orleans, LA, February 1999. USENIX.
[17]
Zhang, Xiaolan, Zheng Wang, Nicholas Gloy, J. Bradley Chen, and Michael D. Smith. System Support for Automated Profiling and Optimization. Proceedings of the Sixteenth ACM Symposium on Operating System Principles. Saint-Malo, France, October 1997.
Bibliography [1]
Aral, Ziya, Illya Gertner, and Greg Schaffer. Efficient Debugging Primitives for Multiprocessors. Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 87-95. Boston, MA, April 1989.
[2]
Balzer, Robert and Neil Goldman. Mediating Connectors. Proceedings of the 19th IEEE International Conference on Distributed Computing Systems Workshop, pp. 73-77. Austin, TX, June 1999.
[3]
Digital Equipment Corporation. DDT Reference Manual, 1972.
[4]
Evans, Thomas G. and D. Lucille Darley. DEBUG - An Extension to Current Online Debugging Techniques. Communications of the ACM, 8(5), pp. 321-326, May 1965.
[5]
Gill, S. The Diagnosis of Mistakes in Programmes on the EDSAC. Proceedings of the Royal Society, Series A, 206, pp. 538-554, May 1951.
[6]
Hollingsworth, Jeffrey K. and Bryan Buck. DyninstAPI Programmer's Guide, Release 1.2. Computer Science Department, University of Maryland, College Park, MD, September 1998.
[7]
Hunt, Galen C. and Michael L. Scott. The Coign Automatic Distributed Partitioning System. Proceedings of the Third Symposium on Operating System Design and Implementation (OSDI '99), pp. 187-200. New Orleans, LA, February 1999. USENIX.
[8]
Hunt, Galen C. and Michael L. Scott. Intercepting and Instrumenting COM Applications. Proceedings of the Fifth Conference on Object-Oriented Technologies and Systems (COOTS'99), pp. 45-56. San Diego, CA, May 1999. USENIX.
39
Win32API Interceptor
Final Report
40