ELF Libraries & Concepts Date: 6 january 2003 By: detach-@t-hackaholic.org http://hackaholic.org/ 1. - Introduction 2. - Operating system theory 2.1 - Dynamic linking 2.2 - Static linking 2.3 - Accessing kernel resources 3. - Introduction to rootkit hacking 3.1 - Targets 3.2 - Methods 1. Introduction This is small tut that needed to be larget but I quit working on this one for now. Since I stopped editing I revisited alittle and post it anyway for it may be useful. This is VERY INTRODUCTORY and is only meant for that. 2. Operating system theory First a little theory on how an operating system works. 2.1 Dynamic linking I think most of you already know what libraries are. Libraries are collections of shared code that can be used by other programs. A program can use library functions (routines) to do part of the job. To illustrate how this works I will write a simple program that lists the content of the given directory. We will examin this program after compilation to illustrate some points. Even if you don't know how to program in C just read the code to get a little clue on how it works. If you are an experienced programmer you shouldn't be wasting your time. ---begin list.c--- /* Read a dir. Usage: list */ /* Include necessary header files */ #include #include #include #include int main(int argc, char *argv[]) { DIR *thedir; struct dirent *entry; if( argc!=2 ) { printf("Give me a dir\n"); exit(0); } thedir = opendir(argv[1]); if( thedir==NULL ) { perror("opendir"); exit(0); } while( (entry=readdir(thedir))!=NULL ) printf("%s\n", entry->d_name); if( errno==EBADF ) perror("readdir"); exit(0); } ---end list.c--- First you ofcourse see the "#include " directives, which are important for using libraries. They define the datastructures and the library functions you may want to use. This is necessary cause if you want to -for example- use the readdir function, this function requires arguments of specific data types which are defined in the header files. After these header files are included, the compiler can check if the arguments you pass to the function are compatible to the data types defined as parameters to the function. { Provided the include files are the same as those used for building the library } The include files you specify are all relative to the /usr/include base directory. You can see which include files you need in the manpage of each function. The manpage also shows the definition of the function so that you know how to use it. So to use a library function lookup the manpage (usually in section 2 or 3 in the manpages), include the required header files and use the function as defined. As an example I will use the dirent.h header file that is included. In dirent.h the readdir function is defined: extern struct dirent *readdir (DIR *__dirp) __THROW; Now to use readdir we need to expect back as a return value a pointer to a dirent structure (struct dirent *). Therefor we first need to declare this dirent structure like I did so: struct dirent *entry; As an argument the readdir function requires a pointer to a 'DIR' variable. DIR *thedir; We got this DIR pointer from the 'opendir' function. The opendir function requires a path in the argument. We use the path that was given to us on the commandline: thedir = opendir(argv[1]); And we check for errors. The manpage says that the opendir function returns 'NULL' on error and sets errno. So we can catch it with; if( thedir==NULL ) { perror("opendir"); exit(0); } dirent.h in it's turn includes bits/dirent.h which defines the dirent structure we use: struct dirent { #ifndef __USE_FILE_OFFSET64 __ino_t d_ino; __off_t d_off; #else __ino64_t d_ino; __off64_t d_off; #endif unsigned short int d_reclen; unsigned char d_type; char d_name[256]; /* We must not include limits.h! */ }; The string "d_name" appears to be the name of the current file in this directory, so we can now access that through entry->d_name, and print it with printf. This tutorial is not a programming guide, however you need to understand a few things: - The header files contain all necessary data types and structures in order to be able to properly use library functions - Library functions are compiled with the same header files, this way the program and the library use the same data types and structures, otherwise this would cause problems during runtime - The manpages of library functions should contain all necessary information about the usage of the library functions The important information you find in the manpages are: - The header files you need to include in order to use correct data types - The data type of the return value of the function you use - The method the function uses to indicate an error - The data types the function expects as parameters - The names and types of variables used inside structures - Functions relevant to this function - Error code definitions and their meaning The program is compiled with GCC and then `dynamically* linked' to the library that contains the readdir and opendir functions: { *I will discuss `statically' linking later } On recent Linux platforms ELF is the standard binary format. ELF stands for `Executable and Linking Format'. An ELF file is called an ELF object. You can have shared objects (libraries) and executable objects. Executable objects can be linked to shared objects. To see where our `list' program is linked with we can use the `ldd' program: detach@debian:~/testcode$ ldd ./list libc.so.6 => /lib/libc.so.6 (0x4001f000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) detach@debian:~/testcode$ In either of these shared libraries the opendir and readdir functions are found. We can check this by reading the dynamic symbol table of the libraries: detach@debian:/lib$ nm -D libc.so.6 |grep readdir 0009cde0 W readdir 0009dad0 T readdir64 0009db88 T readdir64 0009dd20 T readdir64_r 0009dc40 T readdir64_r 0009cea0 W readdir_r detach@debian:/lib$ nm -D libc.so.6 |grep opendir 0009cbfc W opendir detach@debian:/lib$ nm -D ld-linux.so.2 |grep opendir detach@debian:/lib$ nm -D ld-linux.so.2 |grep readdir detach@debian:/lib$ { nm shows the symbols exported by a shared object The .so libraries are also called `shared objects' } The linker takes care of linking our program to the shared code. The linker searches /lib and /usr/lib and directories specified in /etc/ld.so.conf. { The linker is the `ld' program } If the linker cannot find the function it will give an error like: /tmp/cct9gTQm.o(.text+0x79): In function `main': : undefined reference to `readdir' collect2: ld returned 1 exit status So the program is dynamically linked to libc.so.6 and ld-linux.so.2, now let's see how it is executed. This is where ld-linux.so or ld.so comes in. { ld.so on most modern linux is used for a.out binaries and ld-linux.so is used for ELF binaries. } ld.so and ld-linux.so are so called "runtime linkers". { Don't get confused here, we have a (compile-time) linker and a runtime linker. The linker initially specifies which shared objects the program needs, and prepares the executable for the runtime linker. } When a program is dynamically linked it is also linked with ld-linux.so or ld.so. When you execute the program the runtime linker will first be executed and it will fetch the other library code and build an executable image in memory after which it is executed. { The image is build using the code from the libraries and resolving the required addresses (locations of the functions etc) } 2.2 Static linking It is also possible to statically link your program to use the readdir function. In that case the readdir code will be put into the output binary so that it doesn't have to be linked to the library code during runtime. For example: detach@stealth:~/testcode$ gcc -static -o list list.c detach@stealth:~/testcode$ ldd list not a dynamic executable detach@stealth:~/testcode$ Not (dynamically) linked. You can see if a file is linked dynamic or static doing so: detach@debian:~/testcode$ file list list: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.0, statically linked, not stripped detach@debian:~/testcode$ A static executable doesn't use the runtime linker. The disadvantage of static executables is that it takes a huge ammount of diskspace compared to dynamic executables: detach@debian:~/testcode$ gcc -o list list.c detach@debian:~/testcode$ du -kh list 8k list detach@debian:~/testcode$ gcc -static -o list list.c detach@debian:~/testcode$ du -kh list 424k list detach@debian:~/testcode$ A library is specific to the operating system. When a programmer uses a function like readdir in it's C code, the code can be compiled on any UNIX-like system and it should work. Why is the readdir code on Linux for example different than the Solaris readdir? In it's programmers interface they are identical: From linux readdir manpage: struct dirent *readdir(DIR *dir); From solaris readdir manpage: struct dirent *readdir(DIR *dirp); Both OSs also use compatible dirent structures (well, there are no differences that require rewriting source code). So, the C library increases portability of programs. { portable means being able to compile a program cleanly on different platforms } The readdir function is only identical in it's interface to the program, not in it's interface with the kernel. And that's what will be discussed in the next chapter. 2.3 Accessing kernel resources Most functions in the C library used on linux are simply wrappers to system calls in the Linux kernel. Also called the kernel's API (Application Program Interface). A system call could be seen as the function in the kernel. To see how a system call works I will again write a program to illustrate this. This program doesn't use library functions, instead it uses direct linux system calls. { Note that this is why you should use library functions if you want to write portable programs or to avoid the hassle. } How do we print something on screen? Using printf? No, we need to know the system call to use. There are different methods to find this out. This is one method: detach@debian:~/testcode$ cat > hello.c #include int main(void) { printf("Hello World!\n"); } detach@debian:~/testcode$ gcc -o hello hello.c To determine what syscall to use I'll use my favorite method - use strace to find out: detach@debian:~/testcode$ strace -o\!"grep 'Hello World'" ./hello Hello World! write(1, "Hello World!\n", 13) = 13 detach@debian:~/testcode$ { The "-o !" pipes the output to the command } strace runs the executable and uses ptrace systemcall to discover all system calls. Now we know that eventually printf() uses the write(2) system call to write output to screen. We could also check if there is any syscall with 'print' in it's name: detach@debian:/usr/include$ grep print bits/syscall.h detach@debian:/usr/include$ Or check section 2 (system calls) of manpages: detach@debian:~$ apropos print | grep \(2\) detach@debian:~$ apropos write | grep \(2\) _llseek (2) - reposition read/write file offset _sysctl (2) - read/write system parameters llseek (2) - reposition read/write file offset lseek (2) - reposition read/write file offset pread (2) - read from or write to a file descriptor at a given offset pwrite (2) - read from or write to a file descriptor at a given offset readv (2) - read or write a vector sysctl (2) - read/write system parameters write (2) - write to a file descriptor writev (2) - read or write a vector detach@debian:~$ Anyway, write is what we need. The first argument it requires is the file descriptor number, which is 1 for standard output: { 0 is for input (stdin) 1 is for output (stdout) 2 is for error (stderr) } The function definition from write(2) manpage: ssize_t write(int fd, const void *buf, size_t count); Knowing this I first make a simple C program using the write syscall wrapper function: #include char string[]="Write me\n"; int main( void ) { /* I don't use strlen on `string' as I only want the stuff I need */ write( 1, string, 9 ); } Compile and test it: detach@debian:~/testcode$ gcc -o write write.c && ./write Write me detach@debian:~/testcode$ Nice, now see how the write syscall wrapper function works to find out how to use the system call. Compile the above program statically so that the write library function (the wrapper) is included in the binary so that we can use a deadlister like gdb or objdump to examine it: detach@debian:~$ gcc -static -o write write.c Now open the write program in gdb: { GDB is the debugger i use } detach@debian:~/testcode$ gdb (gdb) file write Reading symbols from write...(no debugging symbols found)...done. (gdb) disas write Dump of assembler code for function write: push %ebx mov 0x10(%esp,1),%edx mov 0xc(%esp,1),%ecx mov 0x8(%esp,1),%ebx mov %0x4,%eax int $0x80 ~~~ ~~~ ~~~ nop nop ~~~ End of assembler dump. (gdb) Okay I think most of you don't know assembler, so I will explain only what is important about that piece of the write code. The important stuff is: mov something, %edx mov something, %ecx mov something, %ebx mov $0x4, %eax int $0x80 You see edx, ecx, ebx there right? These are registers, and they carry arguments to the write function of the kernel. You can see that the mov instruction gets something from offset(%esp,1), which is a relative address on the stack (a memory area). Our `main' function has first pushed these arguments on the stack before calling write (use 'disas main' to find this out): push $0xa push $0x809f010 push $0x1 call 0x804c130 The library function pops the arguments from the stack into the registers. Then it calls the write library function at address 0x804c130. The write function mov-ed 0x4 into the EAX register. On linux EAX contains the system call number, in this case the write syscall is 4. We can verify this: detach@debian:/usr/include$ grep write bits/syscall.h #define SYS_write __NR_write ~~~ ~~~ detach@debian:/usr/include$ grep write asm/unistd.h #define __NR_write 4 ~~~ ~~~ detach@debian:/usr/include$ So `write' is definitely syscall number 4. So first we set the syscall number in EAX and then we do int $0x80. This interrupt gives control to the kernel. The kernel will see that write is called and gives the arguments (fetched from EBX .. EDX). Now that we know how to treat this write system call we can write a program that bypasses the use of library functions. { I do this for education, and you will need this to understand how the program/kernel interface works and later on in the tutorial you will need it. } Written in assembler it becomes: --begin write.S-- .data string: .string "Write me\n" .text .globl main main: pushl %ebp # saved basepointer movl %esp, %ebp # old stackpointer is new basepointer movl $0xa, %edx # 3d argument to write (10 dec) movl $string, %ecx # put address of the string in ECX movl $0x1, %ebx # set output to stdout (FD 1) movl $0x4, %eax # set system call to write (4) int $0x80 # call on kernel movl $0x0, %ebx # set exit code to 0 (success) movl $0x1, %eax # set system call to "exit" int $0x80 # call on kernel leave # leave function ret # return --end write.S-- Run it: detach@debian:~/testcode$ gcc write.S && ./a.out Write me detach@debian:~/testcode$ All syscalls in linux are used in a similar way. { Be careful: This example is only for the Intel 32 bit architecture (also called Linux/x86). Linux on alpha or sparc would be different. } The important things you need to understand are: - What are libraries - Why use libraries - How do libraries work - How does a program basically use a library function - What is a system call wrapper function - How to use system calls directly - What is the difference between a dynamically linked program and a statically linked program 2.4 Dynamic linking `"reverse engineered"' To be able to backdoor target programs you need to understand how stuff works. We will `backdoor' binary ELF objects, so this is interesting :). To do this I think it's best to investigate this through reverse engineering. It is very highly recommended that you open a shell too and follow the session. While doing so you might want to try some things yourself. If you never used a debugger before try to experiment with it. I think it's the best way to understand how these things really work, and it makes it far more interesting. I will explain every step in detail, so even if you don't know assembler you should be able to follow. We'll investigate this simple program: detach@stealth:~/research/elf$ cat test.c #include int main(void) { char msg[]="Hello!\n"; write(1, msg, strlen(msg)); } detach@stealth:~/research/elf$ Don't laugh at me, coz it's gonna be more complicated than this ;). Okay, compile: detach@stealth:~/research/elf$ gcc -o test test.c detach@stealth:~/research/elf$ file test test: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.0, dynamically linked (uses shared libs), not stripped detach@stealth:~/research/elf$ We now have a dynamically linked ELF executable. ELF specifies the format of the executable image file to make (dynamic) linking and execution possible. I think it's best to learn about the ELF format, execution and linking by reverse engineering it. I will show you the output on my screen and I will explain what I see. Right now you need to understand that ELF is a specification that tells how code should be put in a file and it tells how the ELF object tells the system how to create the executable image in memory and optionally how it should access library functions. An ELF object has many headers, it starts with an ELF header. Let's examin the ELF header of our program 'test' using the 'readelf' program: detach@stealth:~/research/elf$ readelf -h test ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Intel 80386 Version: 0x1 Entry point address: 0x8048340 Start of program headers: 52 (bytes into file) Start of section headers: 2040 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 6 Size of section headers: 40 (bytes) Number of section headers: 27 Section header string table index: 24 detach@stealth:~/research/elf$ Using readelf (if you don't have it, find it) we can examin properties of an ELF object. The -h option shows the ELF file header. When reverse engineering I write down alot of information for later use. In this example I write down the entry point address (0x8048340), the start of the program header, section header etc. The entry point address is absolute, and the start of the program headers is an offset, so I will introduce my own convention to start an absolute address with 'A' and offset with 'O'. I wrote on my note: A EPT: 0x8048340 - Entrypoint O PHS: 52 (0x34) - Program headers O SHS: 2040 (0x7f8) - Section headers An entry point is the absolute address at where the program starts. While reverse engineering we find out more about this. As you see i translated any decimal number to hex. Another thing noteworthy is that the Program headers start at byte 52, while "Size of this header:" is also 52 bytes in size.. which means the program header table is right after the elf file header. We have found most important information in this header. I don't need to recall it again, next to the ELF header there is a program header, let's see what this header can tell us: detach@stealth:~/research/elf$ readelf -l test Elf file type is EXEC (Executable file) Entry point 0x8048340 There are 6 program headers, starting at offset 52 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000034 0x08048034 0x08048034 0x000c0 0x000c0 R E 0x4 INTERP 0x0000f4 0x080480f4 0x080480f4 0x00013 0x00013 R 0x1 [Requesting program interpreter: /lib/ld-linux.so.2] LOAD 0x000000 0x08048000 0x08048000 0x004c0 0x004c0 R E 0x1000 LOAD 0x0004c0 0x080494c0 0x080494c0 0x00110 0x00128 RW 0x1000 DYNAMIC 0x0004d4 0x080494d4 0x080494d4 0x000c8 0x000c8 RW 0x4 NOTE 0x000108 0x08048108 0x08048108 0x00020 0x00020 R 0x4 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata 03 .data .eh_frame .dynamic .ctors .dtors .got .bss 04 .dynamic 05 .note.ABI-tag detach@stealth:~/research/elf$ We see two tables, one specifying the headers, including the program header table itself, and presumably a table telling which sections are specified in each table. What looks more interesting is first table's "Flg" column which tells something about these sections; whether they are executable, writable, read-only etc. We can also conclude that the ELF object is composed of several sections. These sections will be mapped into memory where they will be called 'segments'. And you can also see that these sections have different access rights. For example, executable code is read-only and executable. We see that there can be more sections in one segment, they are grouped. If you check the readelf help you see that it can show section headers, we may want to check these: detach@stealth:~/research/elf$ readelf -S test There are 27 section headers, starting at offset 0x7f8: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 080480f4 0000f4 000013 00 A 0 0 1 [ 2] .note.ABI-tag NOTE 08048108 000108 000020 00 A 0 0 4 [ 3] .hash HASH 08048128 000128 000034 04 A 4 0 4 [ 4] .dynsym DYNSYM 0804815c 00015c 000080 10 A 5 1 4 [ 5] .dynstr STRTAB 080481dc 0001dc 000080 00 A 0 0 1 [ 6] .gnu.version VERSYM 0804825c 00025c 000010 02 A 4 0 2 [ 7] .gnu.version_r VERNEED 0804826c 00026c 000020 00 A 5 1 4 [ 8] .rel.dyn REL 0804828c 00028c 000008 08 A 4 0 4 [ 9] .rel.plt REL 08048294 000294 000028 08 A 4 b 4 [10] .init PROGBITS 080482bc 0002bc 000017 00 AX 0 0 1 [11] .plt PROGBITS 080482d4 0002d4 000060 04 AX 0 0 4 [12] .text PROGBITS 08048340 000340 000150 00 AX 0 0 16 [13] .fini PROGBITS 08048490 000490 00001d 00 AX 0 0 1 [14] .rodata PROGBITS 080484b0 0004b0 000010 00 A 0 0 4 [15] .data PROGBITS 080494c0 0004c0 000010 00 WA 0 0 4 [16] .eh_frame PROGBITS 080494d0 0004d0 000004 00 WA 0 0 4 [17] .dynamic DYNAMIC 080494d4 0004d4 0000c8 08 WA 5 0 4 [18] .ctors PROGBITS 0804959c 00059c 000008 00 WA 0 0 4 [19] .dtors PROGBITS 080495a4 0005a4 000008 00 WA 0 0 4 [20] .got PROGBITS 080495ac 0005ac 000024 04 WA 0 0 4 [21] .bss NOBITS 080495d0 0005d0 000018 00 WA 0 0 4 [22] .comment PROGBITS 00000000 0005d0 00011d 00 0 0 1 [23] .note NOTE 00000000 0006ed 00003c 00 0 0 1 [24] .shstrtab STRTAB 00000000 000729 0000cf 00 0 0 1 [25] .symtab SYMTAB 00000000 000c30 000470 10 26 34 4 [26] .strtab STRTAB 00000000 0010a0 00021c 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings) I (info), L (link order), G (group), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific) detach@stealth:~/research/elf$ Ahah, this is much more detailed. I think it's interesting to note entry 01 in the section header table. That entry specifies the .interp section. If you look back in the program header table you see "[Requesting program interpreter: /lib/ld-linux.so.2]".. the string "/lib/ld-linux.so.2" is 19 bytes (including a trailing nullbyte), and the section header table and program header table say .interp has size 0x13 (hex).. which is 19 bytes decimal, "/lib/ld-linux.so.2\x0" exactly fits in there. And if you forgot, ld-linux.so.2 is the ELF dynamic linker. Another thing that is interesting is that the .text section's address is exactly the same as the entrypoint we wrote down. Again we see the "Flg" column which defines whether the section is writable/executable etc. Just as interesting i think is the "A" flag which tells if the section will be loaded in memory or not (whether it will be part of the executable image in memory). My notepad now looks like this: Header tables: EPT: A 0x8048340 = .text+0 PHS: O 52 (0x34) SHS: O 2040 (0x7f8) start: A 08048000 .interp: O 0xf4 = "/lib/ld-linux.so.2" A 080480f4 All loaded sections: .interp A 0x080480f4 .note.ABI-tag A 0x08048108 .hash A 0x08048128 .dynsym A 0x0804815c .dynstr A 0x080481dc .gnu.version A 0x0804825c .gnu.version_r A 0x0804826c .rel.dyn A 0x0804828c .rel.plt A 0x08048294 .init A 0x080482bc X .plt A 0x080482d4 X .text A 0x08048340 X .fini A 0x08048490 X .rodata A 0x080484b0 .data A 0x080494c0 W .eh_frame A 0x080494d0 W .dynamic A 0x080494d4 W .ctors A 0x0804959c W .dtors A 0x080495a4 W .got A 0x080495ac W .bss A 0x080495d0 W Let's take a look at the symbol table: detach@stealth:~/research/elf$ readelf -s test Symbol table '.dynsym' contains 8 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 080482e4 39 FUNC WEAK DEFAULT UND __register_frame_info@GLIBC_2.0 (2) 2: 080482f4 60 FUNC GLOBAL DEFAULT UND write@GLIBC_2.0 (2) 3: 08048304 32 FUNC WEAK DEFAULT UND __deregister_frame_info@GLIBC_2.0 (2) 4: 08048314 32 FUNC GLOBAL DEFAULT UND strlen@GLIBC_2.0 (2) 5: 08048324 229 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.0 (2) 6: 080484b4 4 OBJECT GLOBAL DEFAULT 14 _IO_stdin_used 7: 00000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__ Symbol table '.symtab' contains 71 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 080480f4 0 SECTION LOCAL DEFAULT 1 2: 08048108 0 SECTION LOCAL DEFAULT 2 3: 08048128 0 SECTION LOCAL DEFAULT 3 4: 0804815c 0 SECTION LOCAL DEFAULT 4 5: 080481dc 0 SECTION LOCAL DEFAULT 5 6: 0804825c 0 SECTION LOCAL DEFAULT 6 7: 0804826c 0 SECTION LOCAL DEFAULT 7 8: 0804828c 0 SECTION LOCAL DEFAULT 8 9: 08048294 0 SECTION LOCAL DEFAULT 9 10: 080482bc 0 SECTION LOCAL DEFAULT 10 11: 080482d4 0 SECTION LOCAL DEFAULT 11 12: 08048340 0 SECTION LOCAL DEFAULT 12 13: 08048490 0 SECTION LOCAL DEFAULT 13 14: 080484b0 0 SECTION LOCAL DEFAULT 14 15: 080494c0 0 SECTION LOCAL DEFAULT 15 16: 080494d0 0 SECTION LOCAL DEFAULT 16 17: 080494d4 0 SECTION LOCAL DEFAULT 17 18: 0804959c 0 SECTION LOCAL DEFAULT 18 19: 080495a4 0 SECTION LOCAL DEFAULT 19 20: 080495ac 0 SECTION LOCAL DEFAULT 20 21: 080495d0 0 SECTION LOCAL DEFAULT 21 22: 00000000 0 SECTION LOCAL DEFAULT 22 23: 00000000 0 SECTION LOCAL DEFAULT 23 24: 00000000 0 SECTION LOCAL DEFAULT 24 25: 00000000 0 SECTION LOCAL DEFAULT 25 26: 00000000 0 SECTION LOCAL DEFAULT 26 27: 08048364 0 FUNC LOCAL DEFAULT 12 call_gmon_start 28: 00000000 0 FILE LOCAL DEFAULT ABS crtstuff.c 29: 08048390 0 NOTYPE LOCAL DEFAULT 12 gcc2_compiled. 30: 080494c8 0 OBJECT LOCAL DEFAULT 15 p.3 31: 080495a4 0 OBJECT LOCAL DEFAULT 19 __DTOR_LIST__ 32: 080494cc 0 OBJECT LOCAL DEFAULT 15 completed.4 33: 08048390 0 FUNC LOCAL DEFAULT 12 __do_global_dtors_aux 34: 080494d0 0 OBJECT LOCAL DEFAULT 16 __EH_FRAME_BEGIN__ 35: 080483e0 0 FUNC LOCAL DEFAULT 12 fini_dummy 36: 080495d0 24 OBJECT LOCAL DEFAULT 21 object.11 37: 080483e8 0 FUNC LOCAL DEFAULT 12 frame_dummy 38: 0804840c 0 FUNC LOCAL DEFAULT 12 init_dummy 39: 080494d0 0 OBJECT LOCAL DEFAULT 15 force_to_data 40: 0804959c 0 OBJECT LOCAL DEFAULT 18 __CTOR_LIST__ 41: 00000000 0 FILE LOCAL DEFAULT ABS crtstuff.c 42: 08048460 0 NOTYPE LOCAL DEFAULT 12 gcc2_compiled. 43: 08048460 0 FUNC LOCAL DEFAULT 12 __do_global_ctors_aux 44: 080495a0 0 OBJECT LOCAL DEFAULT 18 __CTOR_END__ 45: 08048484 0 FUNC LOCAL DEFAULT 12 init_dummy 46: 080494d0 0 OBJECT LOCAL DEFAULT 15 force_to_data 47: 080495a8 0 OBJECT LOCAL DEFAULT 19 __DTOR_END__ 48: 080494d0 0 OBJECT LOCAL DEFAULT 16 __FRAME_END__ 49: 00000000 0 FILE LOCAL DEFAULT ABS test.c 50: 08048420 0 NOTYPE LOCAL DEFAULT 12 gcc2_compiled. 51: 080494c4 0 OBJECT LOCAL HIDDEN 15 __dso_handle 52: 080494d4 0 OBJECT GLOBAL DEFAULT 17 _DYNAMIC 53: 080482e4 39 FUNC WEAK DEFAULT UND __register_frame_info@@GL 54: 080482f4 60 FUNC GLOBAL DEFAULT UND write@@GLIBC_2.0 55: 080484b0 4 OBJECT GLOBAL DEFAULT 14 _fp_hw 56: 080482bc 0 FUNC GLOBAL DEFAULT 10 _init 57: 08048304 32 FUNC WEAK DEFAULT UND __deregister_frame_info@@ 58: 08048340 0 FUNC GLOBAL DEFAULT 12 _start 59: 08048314 32 FUNC GLOBAL DEFAULT UND strlen@@GLIBC_2.0 60: 080495d0 0 NOTYPE GLOBAL DEFAULT ABS __bss_start 61: 08048420 64 FUNC GLOBAL DEFAULT 12 main 62: 08048324 229 FUNC GLOBAL DEFAULT UND __libc_start_main@@GLIBC_ 63: 080494c0 0 NOTYPE WEAK DEFAULT 15 data_start 64: 08048490 0 FUNC GLOBAL DEFAULT 13 _fini 65: 080495d0 0 NOTYPE GLOBAL DEFAULT ABS _edata 66: 080495ac 0 OBJECT GLOBAL DEFAULT 20 _GLOBAL_OFFSET_TABLE_ 67: 080495e8 0 NOTYPE GLOBAL DEFAULT ABS _end 68: 080484b4 4 OBJECT GLOBAL DEFAULT 14 _IO_stdin_used 69: 080494c0 0 NOTYPE GLOBAL DEFAULT 15 __data_start 70: 00000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__ detach@stealth:~/research/elf$ Aha.. we see our strlen() entry in .dynsym (fourth entry). Write down that address. Furthermore .symtab is very interesting, we see alot of important addresses with their name. Note the address "0x08048340" associated with "_start()" function, 0x08048340 is the entrypoint of this program, also the start of the .text section. I add to my notepad: Header tables: EPT: A 0x8048340 = .text+0 PHS: O 52 (0x34) SHS: O 2040 (0x7f8) start: A 08048000 .interp: O 0xf4 = "/lib/ld-linux.so.2" A 080480f4 All loaded sections: .interp A 0x080480f4 .note.ABI-tag A 0x08048108 .hash A 0x08048128 .dynsym A 0x0804815c .dynstr A 0x080481dc .gnu.version A 0x0804825c .gnu.version_r A 0x0804826c .rel.dyn A 0x0804828c .rel.plt A 0x08048294 .init A 0x080482bc X .plt A 0x080482d4 X .text A 0x08048340 X .fini A 0x08048490 X .rodata A 0x080484b0 .data A 0x080494c0 W .eh_frame A 0x080494d0 W .dynamic A 0x080494d4 W .ctors A 0x0804959c W .dtors A 0x080495a4 W .got A 0x080495ac W .bss A 0x080495d0 W Functions: main A 0x08048420 strlen A 0x08048314 write A 0x080482f4 _start A 0x08048340 _init A 0x080482bc _fini A 0x08048490 Now we have some information to start with. Let's see which system calls get executed once we run the test program: detach@stealth:~/research/elf$ strace -i './test' [????????] execve("./test", ["./test"], [/* 19 vars */]) = 0 [4000d6ad] uname({sys="Linux", node="stealth", ...}) = 0 [4000c6de] brk(0) = 0x80495e8 [4000d0e4] open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory) [4000d0e4] open("/etc/ld.so.cache", O_RDONLY) = 3 [4000cf4f] fstat64(3, {st_mode=S_IFREG|0644, st_size=50454, ...}) = 0 [4000d5ed] old_mmap(NULL, 50454, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40012000 [4000d11d] close(3) = 0 [4000d0e4] open("/lib/libc.so.6", O_RDONLY) = 3 [4000d164] read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\275Z\1"..., 1024) = 1024 [4000cf4f] fstat64(3, {st_mode=S_IFREG|0755, st_size=1104040, ...}) = 0 [4000d5ed] old_mmap(NULL, 1113796, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4001f000 [4000d674] mprotect(0x40127000, 32452, PROT_NONE) = 0 [4000d5ed] old_mmap(0x40127000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x107000) = 0x40127000 [4000d5ed] old_mmap(0x4012d000, 7876, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4012d000 [4000d11d] close(3) = 0 [4000d5ed] old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4012f000 [4000d631] munmap(0x40012000, 50454) = 0 [400d6a94] write(1, "Hello!\n", 7Hello! ) = 7 [400b8fbf] semget(7, 1074972704, 0) = -1 ENOSYS (Function not implemented) [400b8fc6] _exit(7) = ? detach@stealth:~/research/elf$ The -i option shows the instruction pointer's (EIP) value at each call (we may need that). Well, from this you can generally say: - bash calls execve - memory is allocated (probably by the runtime linker) - The runtime linker tries to find out if it needs to load some library before execution (/etc/ld.so.preload) - Then the RTLD (runtime link editor) reads it's cache of libs - It allocates memory the library - Some code from the library is copied into the address space - The code gets executed Let's use ltrace on this to see which library calls take place: detach@stealth:~/research/elf$ ltrace -i './test' [08048361] __libc_start_main(0x08048420, 1, 0xbffffb24, 0x080482bc, 0x08048490 [08048409] __register_frame_info(0x080494d0, 0x080495d0, 0xbffffac8, 0x08048370, 0) = 0x080482e4 [0804844a] strlen("Hello!\n") = 7 [0804845b] write(1, "Hello!\n", 7Hello! ) = 7 [080483d3] __deregister_frame_info(0x080494d0, 0x4012c000, 0x400098bc, 0xbffffb24, 0xbffffac8) = 0 [ffffffff] +++ exited (status 7) +++ detach@stealth:~/research/elf$ This looks more interesting to me, but what does it all mean? __libc_start_main(): _start() was the entrypoint at address 0x8048340.. I think start() found out the address of main() which is 0x08048420 (see your notepad).. 0x080482bc is the address of _init() (see notepad), 0x08048490 is the address of _fini(). __register_frame_info(): 0x080482bc = init() 0x080495d0 = edata section 0xbffffac8 = some stack address 0x08048370 = somewhere in .text section strlen(): returns 7 bytes write(): returns 7 bytes written That's about all interesting information we get here. When we go debug the executable we will know more. We have now formed a very basic idea about how an executable gets loaded, executed and dymically linked. This is very important during the debugging stage, as we will be seeing alot of cryptic code, addresses and calculations we need to have a basic idea of what it is doing. Here's a basic summary of the interpretation: First there's the execve system call for the test program. The kernel probably loads some or all of it in memory along with the program interpreter specified in .interp (ld-linux.so.2). Memory is allocated, very likely by the runtime link editor (the program interpreter). The _start function at the beginning of .text is run, it fetches some info about various functions (notably; _init, _fini and main). main is run and uses some functions in libc, this will probably involve the runtime linker to access these functions. With this idea in our head we proceed to runtime debugging the test program: detach@stealth:~/research/elf$ gdb test (gdb) b main Breakpoint 1 at 0x8048426 (gdb) run Starting program: /home/detach/research/elf/test (no debugging symbols found)...(no debugging symbols found)... Breakpoint 1, 0x08048426 in main () (gdb) Allright, we are now at a point that the test program is in memory. The program has it's own process id and at this point we can get some important information. Go to the process's entry in /proc: detach@stealth:~$ cd /proc/`pidof test` detach@stealth:/proc/3802$ The only thing interesting in this directory is the maps file: detach@stealth:/proc/3802$ cat maps 08048000-08049000 r-xp 00000000 03:01 1531659 /home/detach/research/elf/test 08049000-0804a000 rw-p 00000000 03:01 1531659 /home/detach/research/elf/test 40000000-40011000 r-xp 00000000 03:01 163620 /lib/ld-2.3.1.so 40011000-40012000 rw-p 00011000 03:01 163620 /lib/ld-2.3.1.so 4001f000-40127000 r-xp 00000000 03:01 163623 /lib/libc-2.3.1.so 40127000-4012d000 rw-p 00107000 03:01 163623 /lib/libc-2.3.1.so 4012d000-40130000 rw-p 00000000 00:00 0 bfffe000-c0000000 rwxp fffff000 00:00 0 detach@stealth:/proc/3802$ This is very important information. These are the absolute addresses as mapped into memory. You see that for example the test program has two entries, most important on these lines is "r-xp" and "rw-p". The first instance is the .text (executable) address space of the program, and the second is the .data (read/write) address space of the program. Same applies to the instances of the dynamic linker and the libc library. The space "bfffe000-c0000000" is the stack starting at 0xc0000000, which is empty. The space "4012d000-40130000" is the heap space, also a resizable area. If you look back into this text you can see that these spaces were mmap()ed. At this point we can already calculate the absolute address of a library function like strlen. First lookup the offset of strlen in libc.so.6: detach@stealth:/proc/3802$ nm -D /lib/libc.so.6 |grep strlen 0006d4d8 T strlen detach@stealth:/proc/3802$ So the calculation is the start of the executable segment in memory + the offset: 0x4001f000 + 0x6d4d8 = 0x4008c4d8 let's disassemble the main function: (gdb) disas main Dump of assembler code for function main: 0x8048420
: push %ebp 0x8048421 : mov %esp,%ebp 0x8048423 : sub $0x18,%esp 0x8048426 : lea 0xfffffff8(%ebp),%eax 0x8048429 : mov 0x80484b8,%edx 0x804842f : mov 0x80484bc,%ecx 0x8048435 : mov %edx,0xfffffff8(%ebp) 0x8048438 : mov %ecx,0xfffffffc(%ebp) 0x804843b : add $0xfffffffc,%esp 0x804843e : add $0xfffffff4,%esp 0x8048441 : lea 0xfffffff8(%ebp),%eax 0x8048444 : push %eax 0x8048445 : call 0x8048314 0x804844a : add $0x10,%esp 0x804844d : mov %eax,%eax 0x804844f : push %eax 0x8048450 : lea 0xfffffff8(%ebp),%eax 0x8048453 : push %eax 0x8048454 : push $0x1 0x8048456 : call 0x80482f4 0x804845b : add $0x10,%esp 0x804845e : leave 0x804845f : ret End of assembler dump. (gdb) For now the strlen call is important.. let's see: (gdb) x/i 0x8048314 0x8048314 : jmp *0x80495c4 (gdb) You see.. strlen is at address 0x8048314 (as we already know).. but that is not the real address in memory (we calculated strlen to be at 0x4008c4d8). Also 0x8048314 is not an address in the address space of libc. So what's going on, let's find out: (gdb) b *0x8048314 Breakpoint 2 at 0x8048314 (gdb) c Continuing. Breakpoint 2, 0x08048314 in strlen () (gdb) x/w 0x80495c4 0x80495c4 <_GLOBAL_OFFSET_TABLE_+24>: 0x0804831a (gdb) x/3i 0x8048314 0x8048314 : jmp *0x80495c4 0x804831a : push $0x18 0x804831f : jmp 0x80482d4 <_init+24> (gdb) Good.. so strlen is at address 0x8048314, at that point it jumps to the address stored in the global offset table (got), the address in got+24 is actually the next instruction in the strlen function! So we basically have a jump to the next instruction! It's called lazy linking.. you see that the next instruction jumps to something else.. in these steps the adress of the real strlen is calculated and yes, it's put in GOT+24, so when we would call strlen a second time GOT+24 will contain 0x4008c4d8. 24 translated to hex is 0x18, which is the value pushed on the stack.. so this is probably an index in the GOT table where to store the resolved address. (gdb) stepi 3 0x080482d4 in _init () (gdb) x/3i 0x080482d4 0x80482d4 <_init+24>: pushl 0x80495b0 0x80482da <_init+30>: jmp *0x80495b4 0x80482e0 <_init+36>: add %al,(%eax) (gdb) 0x80495b0 looks like an entry in the got: (gdb) x/w 0x80495b0 0x80495b0 <_GLOBAL_OFFSET_TABLE_+4>: 0x400116d8 (gdb) The address 0x400116d8 is an address in the data segment of the runtime linker (see /proc/`pidof test`/maps)..: Then at _init+30 it jumps to 0x80495b4.. which is also a got entry: (gdb) x/w 0x80495b4 0x80495b4 <_GLOBAL_OFFSET_TABLE_+8>: 0x400092d0 (gdb) This is an address in the runtime linker's executable space: (gdb) x/w 0x400092d0 0x400092d0 <_dl_runtime_resolve>: 0x8b525150 (gdb) Ahah.. this is probably the main function for resolving the address of strlen.. let's find out: (gdb) stepi 0x080482da in _init () (gdb) step Single stepping until exit from function _init, which has no line number information. 0x0804844a in main () (gdb) Yep, we skipped to the instruction after strlen in main.. let's check out the got: (gdb) x/w 0x80495c4 0x80495c4 <_GLOBAL_OFFSET_TABLE_+24>: 0x4008c4d8 (gdb) Yup.. this is the strlen we were looking for: (gdb) x/i 0x4008c4d8 0x4008c4d8 : push %ebp (gdb) We can now tell some more about these ELF executables: When a program is dynamically linked the absolute addresses of external functions need to be resolved during runtime. For this each executable has a PLT (Procedure Linkage Table) which sole purpose is to either jump to the absolute address of the real external function or otherwise to jump to the address resolver. The PLT code gives the resolver function of the runtime linker the offset into the global offset table where the absolute address of the external function is to be stored. When the program calls the library function a second time the PLT code will jump to the external function right away using the GOT entry. We now have the basic idea of how it works. It's time for our first basic executable backdooring. What we will do: We will replace a library function with our own function. This is the most simple example I can give. Follow me: detach@stealth:~/testcode$ cat test.c int hijack_printf(void) { puts("Haha hijacked!"); return 0; } int main(void) { printf("Haha ELF is secure!\n"); exit(0); } detach@stealth:~/testcode$ gcc -o test test.c detach@stealth:~/testcode$ ./test Haha ELF is secure! detach@stealth:~/testcode$ Okay.. what we will do is change the binary 'test' so that the program does not print "Haha ELF is secure!\n", but "Haha hijacked!\n" on the screen. It is pretty easy because we already have the code for that in the executable (the "hijack_printf()" function). If you think you can do it, do it now and then come back to this tutorial, otherwise proceed. First of all, we only need a hex editor and we only need to change one or two bytes. { NOTE: There are many different methods to do this ofcourse, virtually unlimited. However, for this example I may not be using the most effective method, it's only to demonstrate what we have just learned. } Okay, I hope you have tried this yourself.. your solution may be quite different than the one I will present, no problem. Download + install the 'biew' hexeditor and follow: detach@stealth:~/testcode$ gdb -q test (no debugging symbols found)...(gdb) x/i printf 0x8048358 : jmp *0x8049630 (gdb) x/w 0x8049630 0x8049630 <_GLOBAL_OFFSET_TABLE_+28>: 0x0804835e (gdb) Okay.. I just found out where the address of printf will be stored.. namely in GOT+28. I will make sure the address of printf is not stored in GOT+28, but the address of the hijack_printf() function. { Recall that GOT+28 initially contains the address of PLT+n+1, which causes the actual runtime link editor to resolve the absolute address of printf } What I need to do next is find out at which offset in the binary 'test' the GOT+28 can be found. First let's find out some data that we will need: detach@stealth:~/testcode$ readelf -s test |grep hijack 74: 08048460 34 FUNC GLOBAL DEFAULT 12 hijack_printf detach@stealth:~/testcode$ readelf -S test|grep got [20] .got PROGBITS 08049614 000614 000028 04 WA 0 0 4 detach@stealth:~/testcode$ GOT+28 (dec) GOT+0x1c (hex) 0x08048460 hijack_printf GOT 0x614 GOT+0x1c = 0x614+0x1c = 0x630 (hex) Okay I now found out at which offset in the binary we need to place the address '0x08048460'. { The offset of the *start* of the GOT table was 0x614, so we add the 28 to (0x1c hex) to 0x614, we get 0x630, at this address we write the address of the hijack_printf function } We got all information we need to hijack the printf function: Open biew: detach@stealth:~/testcode$ biew test Now do exactly as described: We need to change the translation mode to hexadecimaal.. for this: * Hit F2 and select "Hexadecimal mode" Now we can go the address we want to patch, which was the offset 630. We will need to go to the absolute address 0x630 in test: * Hit F5, select "Absolute" (default selected) * Type "630" in the "Type new shift:" input field In the upperleft corner of the biew window you see 4 bytes.. mine says '5E 83 04 08'. This is the big-endian representation of the data.. so if you translate it to little-endian you get: 0x0804835E. We know this is the value currently in GOT+28, and we learned that this must be the next instruction in the PLT where it will jump to.. But what if we simply change this address so that it will always jump to the hijack_printf function? That's possible. First we need to translate our adress of hijack_printf to big-endian: 60 84 04 08 Ahah.. if you compare this address to the one we had we see that we need to overwrite 3 bytes.. compare this: 60 84 04 08 hijack_printf 5E 83 04 08 PLT+n+1 (printf PLT entry) Okay.. now we need to overwrite '5E' with '60' and '83' with '84': * Hit F4 (Modify), move with the arroy keys to '5E' and type '60' The cursor will move to '83' and overwrite this with '84' Now we got the address overwritten, save it: * Hit F2 * Hit F10 Now try it out: detach@stealth:~/testcode$ ./test Haha hijacked! detach@stealth:~/testcode$ Yes, it worked ! Well this one was quite easy.. however it shows that we got to understand a little about ELF binaries. Also, you understand the basic idea of backdooring ELF binaries. What we have done in a nutshell: Earlier in this chapter we found out that when we call a function from the main function, we call a function within our program, not in the library. This function was a PLT entry, it's purpose was to figure out the absolute address of the library function if necessary. If the address had not yet been found, the PLT would not jump to the real library function, but it would start a resolve routine in the runtime linker, if the address was resolved it would automatically jump to the right library function. But we had just changed this behavior. We have made sure that the PLT entry would never look up the real address of the library function (printf), and made it jump to hijack_printf always. We did this by changing the hardcoded GOT address. This GOT address is writable during runtime, but it will never change because we completely bypassed the runtime resolver. If you didn't understand this I greatly appreciate feedback to detach AT hackaholic.org. However you should also try to re-read the part on reverse engineering the dynamic linking routine. You need to experiment with it.. use the debugger and exactly find out how things worked. But it gets alot worse, the example above was probably the easiest I can think of for many reasons: - The replacement function for printf was already present in the executable - The printf function was only used once, the task therefor was easy, it didn't break the program - We modified the executable, not the library If we want to backdoor something we have many more problems: - We need to put the replacing code into an existing ELF object - Therefor we have to make sure we don't break the executable - We need to make sure we don't break the executable.. it must still function similar And that stuff is pretty tricky. Say we want to make sure programs like 'ls' do not show certain files, it would be best to replace the 'readdir' syscall wrapper in libc. However that would require us to find some space in libc where we can write our replacement function. We will need to make sure the addresses don't change or otherwise we must change the addresses (in case we change the size of the ELF object). We would need to write code that filters certain keywords from the readdir result etc. etc. 3. Introduction to rootkit hacking Well well, I hope you got to understand most of chapter 3. Experiment with that. If you didn't get it please mail me and in the meanwhile just try to carry on reading this as things may become clear afterall. I think if you got this far in this tutorial, practiced and mastered the dark side of I.T., you must have become skilled enough to spot the hacker opertunities in the UNIX system when it comes to rootkits. To make sure of this, this chapter will discuss the various possible targets and methods to backdoorin the UNIX system. 3.1 Targets Before building a rootkit you must understand that a rootkit somehow manipulates the workings of a system in some way. The "way" we do this depends on what part of the system we will target, which in it's turn affects the reliability and stealthness of the rootkit. So the backdooring method depends on many things. According to what you have read about the operating system theory we could divide the system up into the following relevant levels: - application - library - kernel Such a layered list is hierarchical: --kernel system calls-- . . . . /|\ /|\ /|\ /|\ ----------library functions-------- . . . . . . . . . . . . /|\/|\/|\/|\/|\/|\/|\/|\/|\/|\/|\/|\ -----------------applications------------------ . . . . . . . . . . . . . . . . . . . . . . . . [ Multiple applications use one lib-function call ] [ Multiple lib-functions use one system call ] Usually libraries should belong to the application layer, but at our viewpoint there is a difference. Our viewpoint is the effect the backdoored target has on the system. For example if you backdoor a library function, this will affect more than one program because more programs probably use it. I hope you understand that if we want to hide something, the higher in the pyramid we target, the more effective our backdoor will be. Let me explain this.. If we want to hide a file for example, then we can use these ways: - trojan the program that lists the file - trojan the library call that the file uses - trojan the kernel call used Well, if we backdoor the program then only that program will not list the file, others will. If we trojan the library call, there might be another program that use a different library call like scandir instead of readdir to read the directory. Backdooring the library will increase the effect of the backdoor. If we trojan the kernel call, it is very likely that this will affect all the programs and library functions that try to read a directory. If you recall the program I wrote in assembler that bypassed the readdir library call, this would bypass a backdoored library call, but not the system call (kernel function). 3.2 Methods Backdooring a program itself is fairly easy. If you have the source you can modify it, and if you have the binary you can modify it too. It gets a little more complicated when we go manipulate libraries or kernels. There are probably endless possibilities as to what method to use to trojan the target. I mean, in chapter 3 we have done a very easy backdoor of the printf function, it can be done much more difficult and effective. This is also true for library and kernel backdooring. Writing an LKM to replace a kernel system call is usually much easier than patching a kernel image, or patching an existing LKM. I will generally describe some known methods used to trojan executables, libraries and the kernel and some of them will be detailed in the next chapters. It is very important that you follow these examples and try to experiment with them in order to be able to do it yourself. Change code, try something new, even if you at first don't fully understand how things work this is the way to learn. You don't learn by reading. So we now have 3 targets, 3 ways to backdoor the system. But each way has alot of different methods. Basic steps to backdooring are: * If necessary, place your code somewhere in the target * If necessary, fix anything that got broke while placing code in the target * If necessary, make sure the code gets executed instead of the original code * If necessary, make sure the code only gets executed at a certain time Well, most of the time your code will not be in the target. If you need to add code to your target first try to find out if you can find space in the target to overwrite. Otherwise you will need to change the size of the target which is less stealthy. If you have changed the size of the target, chances are that many position- dependent addressing is broken, which you will need to fix. You will need to change some reference in the target to the original feature to make it execute the backdoor version. However, it might be possible you directly changed the behavior of the original code, by patching it. Sometimes you do not want your backdoor to take over, you will have to find some method that your backdoor will only work at certain times. For example, there are different ways we could have backdoored the printf function in the example of chapter 3. We now have patched the GOT table. We could've just as well changed the PLT entry. We could have changed the address called in the main function. We could even have combined this; add a new PLT entry pointing to our backdoor code and let main point to it. We could have targetted libc library in different ways. We could even have modified the runtime linker itself! Think about it. The problem has to do with the difficulty, stealthyness and the scale of the effect. ~EOF Broken pipe