Shellcode Tester

tbowan & aryliin
(en français)
October 18th 2023

Spoiler: For some time now, shellcode examples have started to segfault... Why is this happening? and How to avoid it!

While we were quietly preparing our security courses at UBS, we came across a regression in some shellcode execution examples…

Usually, after having translated our assembler into opcodes (in the form of a string with lots of hexadecimal), we use the classy C code which places the shellcode in a string then executes it as if it were a function .

For our 64-bit Linux shellcode it looks like this:

#include<stdio.h>
#include<string.h>

unsigned char code[] =
 "\x48\xc7\xc0\x3b\x00\x00\x00\x48"
 "\xc7\xc2\x00\x00\x00\x00\x49\xb8"
 "\x2f\x62\x69\x6e\x2f\x73\x68\x00"
 "\x41\x50\x48\x89\xe7\x52\x57\x48"
 "\x89\xe6\x0f\x05\x48\xc7\xc0\x3c"
 "\x00\x00\x00\x48\xc7\xc7\x00\x00"
 "\x00\x00\x0f\x05";

int main(int argc, char **argv) {
    
    int (*ret)() = (int(*)())code;

    ret();
}

By compiling this code then executing it we can test that the shellcode is correct. Without forgetting the -fno-stack-protector -z execstack options of course. Here's what it turned out:

tbowan@testlinux:~$ gcc shellcode.c -o shellcode -fno-stack-protector -z execstack
tbowan@testlinux:~$ ./shellcode
Segmentation fault (core dumped)

SEGMENTATION FAULT. But whyow is this done? Is it talking to me? It worked very well the last time we tested! What could have happened?

What is the problem ?

Intuition suggests a problem of access rights for execution to the memory where the shellcode resides, but how can we be sure?

Understanding Segmentation Fault

Disclaimer: the following uses gdb extensively and should therefore only be read by an informed audience. Not for the faint of heart.

Rather than recovering a core dump, we will directly launch the binary in gdb. Here is what it looks like for us (unnecessary portions are replaced by […], to make it more digestible). As expected, the segmentation fault occurs again.

tbowan@testlinux:~$ gdb shellcode
[...]
(gdb) r
[...]
Program received signal SIGSEGV, Segmentation fault.
0x0000555555558020 in code ()

Except that with gdb, we can know which instruction is causing the problem. Since it provides the address of the instruction (0x0000555555558020), we can ask gdb to disassemble the corresponding memory:

(gdb) disass 0x0000555555558020
Dump of assembler code for function code:
=> 0x0000555555558020 <+0>:     mov    $0x3b,%rax
   0x0000555555558027 <+7>:     mov    $0x0,%rdx
   0x000055555555802e <+14>:    movabs $0x68732f6e69622f,%r8
[...]

For the curious, we can write disass (like tbowan) or disas (like aryliin). Everyone has their own pronunciation.

Putting 59 in the rax register should not cause any segmentation problems because it does not perform any memory access...

Since the instruction itself has nothing to do with segfault, maybe that’s where it is located? So let’s ask gdb for the list of memory zones with their permissions…

(gdb) info proc mappings
[...]
          Start Addr           End Addr       Size     Offset  Perms  objfile
[...]
      0x555555558000     0x555555559000     0x1000     0x3000  rw-p   /home/tbowan/shellcode
[...]

And indeed, the corresponding memory area (0x555555558000) does not have execution rights (it is in rw-p, the x is missing).

If you don't remember that mov $0x3b, %rax is indeed the first instruction in the shellcode, we can ask gdb where it starts:

(gdb) p &code
$2 = (<data variable, no debug info> *) 0x555555558020 <code>

The instruction which generates the error is indeed the first of the shellcode which therefore could not be executed at all; the permission error having been raised when the shellcode is loaded into memory for execution (which its permissions do not allow).

But why ?

When we encounter a code for the first time and it segfault, we can think that its author has not tested it, that he may not know that much about the subject and by the These days we might wonder if it might not be content generated by AI.

Except that those shellcodes are ours. And we remember having saw this type of code and having tested those hundreds of times... So why doesn't it work anymore?

Good news someone (f0rm2l1n) already wondered about this and was kind enough to document those findings in an article; What is happened to execstack?

To summarize: the linux kernel was modified in March 2020 so that the execstack option no longer implies making all readable areas executable but only touch the stack (which is more understandable).

So all that C code that used to test shellcodes by putting them as a global variable only worked because execstack did more than what it was supposed to do. On a fixed kernel, these codes no longer work.

It takes time for the change to spread in the versions released, then for those version to be integrated in distributions and then for those distribution to be deployed. That time explains the delay between the correction and the regression.

What solutions?

So it’s nice to see whyow it doesn’t work anymore, but how do fix our codes now?

Move into stack

The first variation is quite stupid simple. Since we already have C code and a compilation chain using execstack, let's move the shellcode from the global variables to the stack...

#include<stdio.h>
#include<string.h>

int main(int argc, char **argv) {

    unsigned char code[] =
     "\x48\xc7\xc0\x3b\x00\x00\x00\x48"
     "\xc7\xc2\x00\x00\x00\x00\x49\xb8"
     "\x2f\x62\x69\x6e\x2f\x73\x68\x00"
     "\x41\x50\x48\x89\xe7\x52\x57\x48"
     "\x89\xe6\x0f\x05\x48\xc7\xc0\x3c"
     "\x00\x00\x00\x48\xc7\xc7\x00\x00"
     "\x00\x00\x0f\x05";
    
    int (*ret)() = (int(*)())code;

    ret();
}

It's not the most beautiful solution but it has at least two advantages:

It only modifies the example code a little. If someone comes here after having read similar documentation, they will find their footing more easily.
Since the goal is often to explain stack overflows, executing shellcode stored in the stack is educationally interesting.

But we don't always want to explain stack overflows and we don't always have a compilation chain with the right options...

Allow execution

To be a little cleaner you might want to do without execstack. When the goal is to show that a shellcode is indeed a piece of executable code, it is a big detour to have to talk about the stack, its protections, the why, the how, …

To execute our shellcode, we will therefore mark its memory area as executable. For this we will use two functions:

mprotect() which allows you to change memory permissions and therefore needs the address of the page to modify (declared in sys/mman.h)
sysconf() which allows, among other things, to know the size of the pages, which will allow us to calculate the address of the page to modify (declared in unistd.h).

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>

unsigned char code[] =
 "\x48\xc7\xc0\x3b\x00\x00\x00\x48"
 "\xc7\xc2\x00\x00\x00\x00\x49\xb8"
 "\x2f\x62\x69\x6e\x2f\x73\x68\x00"
 "\x41\x50\x48\x89\xe7\x52\x57\x48"
 "\x89\xe6\x0f\x05\x48\xc7\xc0\x3c"
 "\x00\x00\x00\x48\xc7\xc7\x00\x00"
 "\x00\x00\x0f\x05";

int main(int argc, char **argv) {

    size_t pagesize = sysconf(_SC_PAGE_SIZE) ;
    mprotect(
        // Area's address (aligned)
        code - ((size_t) code % pagesize),
        // Area's size
        pagesize,
        // New permission
        PROT_READ | PROT_WRITE | PROT_EXEC
    ) ;

    int (*ret)() = (int(*)())code;

    ret();
}

This time the compilation is much simpler since we no longer need to remove memory protections. A simple make (which will use the implicit rules) is enough.

tbowan@testlinux:~$ make shellcode
cc     shellcode.c   -o shellcode
tbowan@testlinux:~$ ./shellcode
$

Allocate a dedicated area

We're not going to lie, using mprotect() is a bit like killing a mosquito with napalm, there are collateral effects for things around the target.

So rather than modifying the entire area containing the shellcode and potentially other stuff that we wouldn't want to touch, we can allocate an area just for it. This is what we did for our shellcode for Windows 10.

This similar version for GNU/Linux will therefore use two functions:

mmap() to allocate a memory area with the correct permissions (declared in sys/mman.h),
memcpy() to copy the shellcode to this cozy area (declared in string.h).

#include <sys/mman.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

unsigned char code[] =
 "\x48\xc7\xc0\x3b\x00\x00\x00\x48"
 "\xc7\xc2\x00\x00\x00\x00\x49\xb8"
 "\x2f\x62\x69\x6e\x2f\x73\x68\x00"
 "\x41\x50\x48\x89\xe7\x52\x57\x48"
 "\x89\xe6\x0f\x05\x48\xc7\xc0\x3c"
 "\x00\x00\x00\x48\xc7\xc7\x00\x00"
 "\x00\x00\x0f\x05";

int main(int argc, char **argv) {

    size_t pagesize = sysconf(_SC_PAGE_SIZE) ;

    void * buffer = mmap(
        // Page's address
        // NULL => let kernel find one
        NULL,
        // Area's size
        pagesize,
        // Permissions (all)
        PROT_READ | PROT_WRITE | PROT_EXEC,
        // Flags
        // * MAP_PRIVATE   : do not share
        // * MAP_ANONYMOUS : do not read a file
        MAP_PRIVATE | MAP_ANONYMOUS,
        // specific for MAP_ANONYMOUS
        -1, 0);
    
    memcpy(buffer, code, sizeof(code)) ;

    int (*ret)() = (int(*)())buffer;

    ret();
}

And after ?

With these changes we can start testing our shellcodes again as before. Meanwhile, couldn’t we improve the code a bit?

Personally, I never liked using a function pointer to execute shellcode. It works, that's not the point, but it's ugly and requires explaining function pointers.

Pedagogically, we just want to explain that we can pass the execution flow to the shellcode and we end up having to explain the syntax of calls on function pointers in C:

int (*ret)() = (int(*)())code;
ret();

And again, this is the “easy” version which makes an explicit cast of the buffer (an unsigned char *) into a function pointer (int(*)()) before calling it. But we can go deeper in the Myth by doing everything with a single line of code:

((int(*)()) code) ();

Indeed, the shellcode never returns a value... It pops up a shell, a client or a server waiting for orders, that kind of thing. At worst, in the event of a problem, it exits but it does not return.

So, rather than a function call (call in assembly), what if we jumped directly into the shellcode (with a jmp)? Good news, there is a C instruction for that¹:

goto *code;

Isn’t it much clearer that way?