|
2 | 2 |
|
3 | 3 | [](https://github.com/psf/black) [](https://github.com/0vercl0k/udmp-parser/blob/master/LICENSE)
|
4 | 4 |
|
| 5 | +`udmp-parser` is a cross-platform C++ parser library for Windows [user minidumps](https://docs.microsoft.com/en-us/windows/win32/debug/minidump-files) written by [0vercl0k](https://github.com/0vercl0k). The Python bindings were added by [hugsy](https://github.com/hugsy). Refer to the [project page on Github](https://github.com/0vercl0k/udmp-parser) for documentation, issues and pull requests. |
5 | 6 |
|
6 |
| -This package holds the Python bindings for the [`udmp-parser`](https://github.com/0vercl0k/udmp-parser) project. |
| 7 | + |
7 | 8 |
|
8 |
| -`udmp-parser` is a cross-platform C++ parser library for Windows [user minidumps](https://docs.microsoft.com/en-us/windows/win32/debug/minidump-files) written by [0vercl0k](https://github.com/0vercl0k). The Python bindings were added by [hugsy](https://github.com/hugsy). |
| 9 | +The library supports Intel 32-bit / 64-bit dumps and provides read access to things like: |
9 | 10 |
|
10 |
| -Refer to the [project page on Github](https://github.com/0vercl0k/udmp-parser) for documentation, issues and pull requests. |
| 11 | +- The thread list and their context records, |
| 12 | +- The virtual memory, |
| 13 | +- The loaded modules. |
| 14 | + |
| 15 | +## Installing from PyPI |
| 16 | + |
| 17 | +The easiest way is simply to: |
| 18 | + |
| 19 | +``` |
| 20 | +pip install udmp_parser |
| 21 | +``` |
| 22 | + |
| 23 | +## Usage |
| 24 | + |
| 25 | +The Python API was built around the C++ code so the names were preserved. Everything lives within the module `udmp_parser`. |
| 26 | +Note: For convenience, a simple [pure Python script](src/python/tests/utils.py) was added to generate minidumps ready to use: |
| 27 | + |
| 28 | +```python |
| 29 | +$ python -i src/python/tests/utils.py |
| 30 | +>>> pid, dmppath = generate_minidump_from_process_name("winver.exe") |
| 31 | +Minidump generated successfully: PID=3232 -> minidump-winver.exe-1687024880.dmp |
| 32 | +>>> pid |
| 33 | +3232 |
| 34 | +>>> dmppath |
| 35 | +WindowsPath('minidump-winver.exe-1687024880.dmp')) |
| 36 | +``` |
| 37 | + |
| 38 | +Parsing a minidump object is as simple as: |
| 39 | + |
| 40 | +```python |
| 41 | +>>> import udmp_parser |
| 42 | +>>> udmp_parser.version.major, udmp_parser.version.minor, udmp_parser.version.release |
| 43 | +(0, 4, '') |
| 44 | +>>> dmp = udmp_parser.UserDumpParser() |
| 45 | +>>> dmp.Parse(pathlib.Path("C:/temp/rundll32.dmp")) |
| 46 | +True |
| 47 | +``` |
| 48 | + |
| 49 | +Feature-wise, here are some examples of usage: |
| 50 | + |
| 51 | +### Threads |
| 52 | + |
| 53 | +Get a hashmap of threads (as `{TID: ThreadObject}`), access their information: |
| 54 | + |
| 55 | +```python |
| 56 | +>>> threads = dmp.Threads() |
| 57 | +>>> len(threads) |
| 58 | +14 |
| 59 | +>>> threads |
| 60 | +{5292: Thread(Id=0x14ac, SuspendCount=0x1, Teb=0x2e8000), |
| 61 | + 5300: Thread(Id=0x14b4, SuspendCount=0x1, Teb=0x2e5000), |
| 62 | + 5316: Thread(Id=0x14c4, SuspendCount=0x1, Teb=0x2df000), |
| 63 | + 3136: Thread(Id=0xc40, SuspendCount=0x1, Teb=0x2ee000), |
| 64 | + 4204: Thread(Id=0x106c, SuspendCount=0x1, Teb=0x309000), |
| 65 | + 5328: Thread(Id=0x14d0, SuspendCount=0x1, Teb=0x2e2000), |
| 66 | + 1952: Thread(Id=0x7a0, SuspendCount=0x1, Teb=0x2f7000), |
| 67 | + 3888: Thread(Id=0xf30, SuspendCount=0x1, Teb=0x2eb000), |
| 68 | + 1760: Thread(Id=0x6e0, SuspendCount=0x1, Teb=0x2f4000), |
| 69 | + 792: Thread(Id=0x318, SuspendCount=0x1, Teb=0x300000), |
| 70 | + 1972: Thread(Id=0x7b4, SuspendCount=0x1, Teb=0x2fa000), |
| 71 | + 1228: Thread(Id=0x4cc, SuspendCount=0x1, Teb=0x2fd000), |
| 72 | + 516: Thread(Id=0x204, SuspendCount=0x1, Teb=0x303000), |
| 73 | + 2416: Thread(Id=0x970, SuspendCount=0x1, Teb=0x306000)} |
| 74 | +``` |
| 75 | + |
| 76 | +And access invidual thread, including their register context: |
| 77 | + |
| 78 | +```python |
| 79 | +>>> thread = threads[5292] |
| 80 | +>>> print(f"RIP={thread.Context.Rip:#x} RBP={thread.Context.Rbp:#x} RSP={thread.Context.Rsp:#x}") |
| 81 | +RIP=0x7ffc264b0ad4 RBP=0x404fecc RSP=0x7de628 |
| 82 | +``` |
| 83 | + |
| 84 | + |
| 85 | +### Modules |
| 86 | + |
| 87 | +Get a hashmap of modules (as `{address: ModuleObject}`), access their information: |
| 88 | + |
| 89 | +```python |
| 90 | +>>> modules = dmp.Modules() |
| 91 | +>>> modules |
| 92 | +{1572864: Module_t(BaseOfImage=0x180000, SizeOfImage=0x3000, ModuleName=C:\Windows\SysWOW64\sfc.dll), |
| 93 | + 10813440: Module_t(BaseOfImage=0xa50000, SizeOfImage=0x14000, ModuleName=C:\Windows\SysWOW64\rundll32.exe), |
| 94 | + 1929052160: Module_t(BaseOfImage=0x72fb0000, SizeOfImage=0x11000, ModuleName=C:\Windows\SysWOW64\wkscli.dll), |
| 95 | + 1929183232: Module_t(BaseOfImage=0x72fd0000, SizeOfImage=0x52000, ModuleName=C:\Windows\SysWOW64\mswsock.dll), |
| 96 | + 1929576448: Module_t(BaseOfImage=0x73030000, SizeOfImage=0xf000, ModuleName=C:\Windows\SysWOW64\browcli.dll), |
| 97 | + 1929641984: Module_t(BaseOfImage=0x73040000, SizeOfImage=0xa000, ModuleName=C:\Windows\SysWOW64\davhlpr.dll), |
| 98 | + 1929707520: Module_t(BaseOfImage=0x73050000, SizeOfImage=0x19000, ModuleName=C:\Windows\SysWOW64\davclnt.dll), |
| 99 | + 1929838592: Module_t(BaseOfImage=0x73070000, SizeOfImage=0x18000, ModuleName=C:\Windows\SysWOW64\ntlanman.dll), |
| 100 | + [...] |
| 101 | + 140720922427392: Module_t(BaseOfImage=0x7ffc24980000, SizeOfImage=0x83000, ModuleName=C:\Windows\System32\wow64win.dll), |
| 102 | + 140720923017216: Module_t(BaseOfImage=0x7ffc24a10000, SizeOfImage=0x59000, ModuleName=C:\Windows\System32\wow64.dll), |
| 103 | + 140720950280192: Module_t(BaseOfImage=0x7ffc26410000, SizeOfImage=0x1f8000, ModuleName=C:\Windows\System32\ntdll.dll)} |
| 104 | +``` |
| 105 | + |
| 106 | +Access directly module info: |
| 107 | + |
| 108 | +```python |
| 109 | +>>> ntdll_modules = [mod for addr, mod in dmp.Modules().items() if mod.ModuleName.lower().endswith("ntdll.dll")] |
| 110 | +>>> len(ntdll_modules) |
| 111 | +2 |
| 112 | +>>> for ntdll in ntdll_modules: |
| 113 | + print(f"{ntdll.ModuleName=} {ntdll.BaseOfImage=:#x} {ntdll.SizeOfImage=:#x}") |
| 114 | + |
| 115 | +ntdll.ModuleName='C:\\Windows\\SysWOW64\\ntdll.dll' ntdll.BaseOfImage=0x77430000 ntdll.SizeOfImage=0x1a4000 |
| 116 | +ntdll.ModuleName='C:\\Windows\\System32\\ntdll.dll' ntdll.BaseOfImage=0x7ffc26410000 ntdll.SizeOfImage=0x1f8000 |
| 117 | +``` |
| 118 | + |
| 119 | +A convenience function under `udmp_parser.UserDumpParser.ReadMemory()` can be used to directly read memory from the dump. The signature of the function is as follow: `def ReadMemory(Address: int, Size: int) -> list[int]`. So to dump for instance the `wow64` module, it would go as follow: |
| 120 | + |
| 121 | +```python |
| 122 | +>>> wow64 = [mod for addr, mod in dmp.Modules().items() if mod.ModuleName.lower() == r"c:\windows\system32\wow64.dll"][0] |
| 123 | +>>> print(str(wow64)) |
| 124 | +Module_t(BaseOfImage=0x7ffc24a10000, SizeOfImage=0x59000, ModuleName=C:\Windows\System32\wow64.dll) |
| 125 | +>>> wow64_module = bytearray(dmp.ReadMemory(wow64.BaseOfImage, wow64.SizeOfImage)) |
| 126 | +>>> assert wow64_module[:2] == b'MZ' |
| 127 | +>>> import hexdump |
| 128 | +>>> hexdump.hexdump(wow64_module[:128]) |
| 129 | +00000000: 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 MZ.............. |
| 130 | +00000010: B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 ........@....... |
| 131 | +00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ |
| 132 | +00000030: 00 00 00 00 00 00 00 00 00 00 00 00 E8 00 00 00 ................ |
| 133 | +00000040: 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 ........!..L.!Th |
| 134 | +00000050: 69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F is program canno |
| 135 | +00000060: 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20 t be run in DOS |
| 136 | +00000070: 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 mode....$....... |
| 137 | +``` |
| 138 | + |
| 139 | + |
| 140 | +### Memory |
| 141 | + |
| 142 | +The memory blocks can also be enumerated in a hashmap `{address: MemoryBlock}`. |
| 143 | + |
| 144 | +```python |
| 145 | +>>> memory = dmp.Memory() |
| 146 | +>>> len(memory) |
| 147 | +0x260 |
| 148 | +>>> memory |
| 149 | +[...] |
| 150 | + 0x7ffc26410000: [MemBlock_t(BaseAddress=0x7ffc26410000, AllocationBase=0x7ffc26410000, AllocationProtect=0x80, RegionSize=0x1000)], |
| 151 | + 0x7ffc26411000: [MemBlock_t(BaseAddress=0x7ffc26411000, AllocationBase=0x7ffc26410000, AllocationProtect=0x80, RegionSize=0x11c000)], |
| 152 | + 0x7ffc2652d000: [MemBlock_t(BaseAddress=0x7ffc2652d000, AllocationBase=0x7ffc26410000, AllocationProtect=0x80, RegionSize=0x49000)], |
| 153 | + 0x7ffc26576000: [MemBlock_t(BaseAddress=0x7ffc26576000, AllocationBase=0x7ffc26410000, AllocationProtect=0x80, RegionSize=0x1000)], |
| 154 | + 0x7ffc26577000: [MemBlock_t(BaseAddress=0x7ffc26577000, AllocationBase=0x7ffc26410000, AllocationProtect=0x80, RegionSize=0x2000)], |
| 155 | + 0x7ffc26579000: [MemBlock_t(BaseAddress=0x7ffc26579000, AllocationBase=0x7ffc26410000, AllocationProtect=0x80, RegionSize=0x9000)], |
| 156 | + 0x7ffc26582000: [MemBlock_t(BaseAddress=0x7ffc26582000, AllocationBase=0x7ffc26410000, AllocationProtect=0x80, RegionSize=0x86000)], |
| 157 | + 0x7ffc26608000: [MemBlock_t(BaseAddress=0x7ffc26608000, AllocationBase=0x0, AllocationProtect=0x0, RegionSize=0x3d99e8000)]} |
| 158 | +``` |
| 159 | + |
| 160 | +To facilitate the parsing in a human-friendly manner, some helper functions are provided: |
| 161 | + * `udmp_parser.utils.TypeToString`: convert the region type to its meaning (from MSDN) |
| 162 | + * `udmp_parser.utils.StateToString`: convert the region state to its meaning (from MSDN) |
| 163 | + * `udmp_parser.utils.ProtectionToString`: convert the region protection to its meaning (from MSDN) |
| 164 | + |
| 165 | +This allows to search and filter in a more comprehensible way: |
| 166 | + |
| 167 | + |
| 168 | +```python |
| 169 | +# Collect only executable memory regions |
| 170 | +>>> exec_regions = [region for _, region in dmp.Memory().items() if "PAGE_EXECUTE_READ" in udmp_parser.utils.ProtectionToString(region.Protect)] |
| 171 | + |
| 172 | +# Pick any, disassemble code using capstone |
| 173 | +>>> exec_region = exec_regions[-1] |
| 174 | +>>> mem = dmp.ReadMemory(exec_region.BaseAddress, 0x100) |
| 175 | +>>> for insn in cs.disasm(bytearray(mem), exec_region.BaseAddress): |
| 176 | + print(f"{insn=}") |
| 177 | + |
| 178 | +insn=<CsInsn 0x7ffc26582000 [cc]: int3 > |
| 179 | +insn=<CsInsn 0x7ffc26582001 [cc]: int3 > |
| 180 | +insn=<CsInsn 0x7ffc26582002 [cc]: int3 > |
| 181 | +insn=<CsInsn 0x7ffc26582003 [cc]: int3 > |
| 182 | +insn=<CsInsn 0x7ffc26582004 [cc]: int3 > |
| 183 | +insn=<CsInsn 0x7ffc26582005 [cc]: int3 > |
| 184 | +insn=<CsInsn 0x7ffc26582006 [cc]: int3 > |
| 185 | +insn=<CsInsn 0x7ffc26582007 [cc]: int3 > |
| 186 | +insn=<CsInsn 0x7ffc26582008 [cc]: int3 > |
| 187 | +insn=<CsInsn 0x7ffc26582009 [cc]: int3 > |
| 188 | +insn=<CsInsn 0x7ffc2658200a [cc]: int3 > |
| 189 | +insn=<CsInsn 0x7ffc2658200b [cc]: int3 > |
| 190 | +insn=<CsInsn 0x7ffc2658200c [cc]: int3 > |
| 191 | +insn=<CsInsn 0x7ffc2658200d [cc]: int3 > |
| 192 | +insn=<CsInsn 0x7ffc2658200e [cc]: int3 > |
| 193 | +insn=<CsInsn 0x7ffc2658200f [cc]: int3 > |
| 194 | +insn=<CsInsn 0x7ffc26582010 [48895c2410]: mov qword ptr [rsp + 0x10], rbx> |
| 195 | +insn=<CsInsn 0x7ffc26582015 [4889742418]: mov qword ptr [rsp + 0x18], rsi> |
| 196 | +insn=<CsInsn 0x7ffc2658201a [57]: push rdi> |
| 197 | +insn=<CsInsn 0x7ffc2658201b [4156]: push r14> |
| 198 | +insn=<CsInsn 0x7ffc2658201d [4157]: push r15> |
| 199 | +[...] |
| 200 | +``` |
| 201 | + |
| 202 | +# Authors |
| 203 | + |
| 204 | +* Axel '[@0vercl0k](https://twitter.com/0vercl0k)' Souchet |
| 205 | + |
| 206 | +# Contributors |
| 207 | + |
| 208 | +[  ](https://github.com/0vercl0k/udmp-parser/graphs/contributors) |
0 commit comments