Whilst this isn’t a manifesto in true Jerry Maguire style, this blog post sets out a new BOF PE design in hopes that C2 vendors would consider implementing. Beacon Object File (BOF) support has been at the cornerstone of capability for any modern C2 platform since its inception by Cobalt Strike 4.1 back in 2020. It was a major step forward towards integrating a modular and extensible design whilst still being able to interact with the C2 platform itself via the Beacon API. 

After five years of development using this approach, cracks in the design have begun to show. Complex BOFs become difficult to maintain and the lack of higher-level language support (such as the C++ STL library and exceptions) can make source code bloated. So why can we not use uber new C++20 features but also execute from memory in a native fashion whilst maintaining integration with the Beacon API? 

This proposal will hopefully allow just that. In this article, I propose a reference design for a new BOF portable executable (PE) concept that will hopefully solve some of the current constraints and issues faced by BOF developers. Features include: 

  • The ability to run the same linked EXE standalone or within a C2 environment 
  • Full support for C++ and exceptions are possible 
  • Symbol resolution issues disappear  
  • Code will be easier to maintain vs the traditional BOF design 

Isn’t This Just Another In Memory PE Loader?

Not quite. In memory PE executors are unaware that they are executing within the confines of a C2 agent. This increases the complexity of loaders as they attempt to capture program output and feed arguments to the executable run from memory. Solutions such as Fortra’s No-Consolation have worked towards resolving some of these issues, but at a cost of loader complexity. Some of this complexity was described in a blog published by Forta.

Unlike in memory PE execution modules, BOF PE files would have full use of the Beacon API, therefore, no special output capture or argument processing would need to be performed. Developers would simply use the BeaconPrintf or BeaconOutput APIs to send output and data to the C2 server and leverage the C2 solution’s argument packing format as before.

How Would It Work?

BOF PE source will include the beacon.h header as normal, but there will now be an additional import library, beacon.lib that developers will be required to link during compilation. This will create a dependency on beacon.dll for the linked BOF PE. For standalone execution, this DLL acts as the beacon compatibility layer. Both beacon.dll and the new BOF PE executable will be required in the same folder to execute. Typically, the compatibility layer will write program output to stdout instead of writing the data over the C2 channel.

When executing under a C2 agent, this beacon compatibility layer is no longer required. During processing of the BOF PE DLL imports, whenever functions imported from beacon.dll are found, they are plugged into that specific C2 provider’s API calls directly as would typically happen under traditional BOF execution. The beacon.dll file is not resolved or loaded from disk at all.  All other DLLs are processed as normal imports and are resolved accordingly. 

Standalone Execution

Traditional BOFs are not easily executable as standalone programs. This can often lead to duplicated efforts to create standalone tools and BOF’s that perform the same task.

The BOF PE design will allow execution of the fully linked PE using a beacon compatibility layer. This is useful for PEs which support execution over a SOCKS proxy.  Whilst standalone execution would also be possible from within the target environment, dropping BOF PE files to disk would not be recommended for opsec purposes.

c:\bofs\mybof.exe "String arg" 12345 c:\files\binary.bin

The same EXE file could be used for execution within the C2 environment.

bof-pe c:\bofs\mybof.exe "String arg" 12345 c:\files\binary.bin

This functionality would be implemented by a new beacon API which I have named BeaconInvokeStandalone which could be called from the program’s main function.

int BeaconInvokeStandalone(int argc, const char* argv[], const char* bof_args_def, BeaconEntryPtr entry);

The bof_args_def argument defines the format expected for the BOF argument packing format, so for example, a BOF that requires two arguments, a string and a short would be defined using zs. This allows the beacon compatibility layer to convert the arguments to beacon’s internal packed format prior to calling the BOF entry point function defined via the entry argument. 

Exception Support

Traditional BOFs do not support SEH/C++ exceptions. This often results in verbose code and “nested if hell” where each function is checked for failure. 

Since BOF PE files will be fully linked executables, handling both SEH/C++ exceptions  would be possible. BOF PE loaders will have all the information necessary inside the compiled PE to update the runtime or inverted function tables.  

This will be the more complex element to the new loader design for any C2 provider that chooses to implement the BOF PE proposal.   

Within ntdll, a non-exported table exists called the LdrpInvertedFunctionTables. The inverted function table contains a sorted list of memory regions that have exception handlers within each region.  This table is usually modified when a module is loaded by ntdll, including the main executable itself.  But because our PE is reflectively loaded, we need to find this table from ntdll so that we can insert a new entry for our memory mapped BOF PE. 

RiskInsight released a great blog on how loaders can find this table without using static signatures for various versions of ntdll.dll, which is well worth the read. A similar technique has been implemented within the reference design but additional guards have been added to ensure that the memory being queried is more likely to be the LdrpInvertedFunctionTables region we are searching for.  These additional guards were required to add support for x86 architectures. You can see this on line 259-261 in main.c which is part the sample loader.   

x64 

For x64 PE files, support for exceptions is generally easy. Any x64 PE that makes use of exception handlers will have the exception directory populated within the data directories array. These are typically hosted inside the .pdata section.  A single call to the RtlAddFunctionTable API with information on the location of the exception tables from the BOF PE will be all that is needed. Et voila, you have exception support in your reflectively loaded PE. 

x86

On the other hand, x86 is a different beast altogether. Exception handlers and unwind information is pushed to the stack for each frame that leverages exceptions. Because of this, in theory, x86 exceptions should work without any special considerations.  But exception information pushed to the stack introduces a form of stack overflow vulnerability where the exception handler for a particular frame can be overwritten.  To combat this, Microsoft introduced Structured Exception Handling Overwrite Protection (SEHOP) after the release of Vista SP1/Server 2008.  This introduced a new compiler option for Visual Studio called /SAFESEH that inserted valid exception handlers inside the PE’s load config directory.  So, if you are reflectively loading a BOF PE inside an executable that was compiled with /SAFESEH, then any exception raised is expected to be found within the inverted function table.  If the exception handler is not found, the program is terminated immediately. 

Different compilers can implement exception support differently for x86 too. For example,  GCC does not use SEH and can either use DWARF2 EH or the setjump-longjump (sjlj) model. I won’t go into too much detail on the internals of both models, but typically they require initialization during startup of the PE prior to the execution of main. Therefore our design needs to accommodate this by calling __main() before any exceptions are thrown by BOF PE files compiled with GCC.   

Modern day MSVC/Clang compilers on the other hand use SAFESEH. But for x86, we still need to make a call to __scrt_initialize_crt for some of this magic to work.

Standard Import Format for Windows APIs

BOFs are required to import Windows APIs using a non-standard import format, for example:  

__declspec(dllimport) KERNEL32$GetCommandLineW 

This can often lead to the creation of macros or hacks to be able to use the API as they should be called, GetCommandLineW. The BOF PE design will solve this issue as the BOF will be a fully linked EXE file with imports from dependent DLL’s.

Single Object File

Traditional BOFs are single compilation units. A compilation unit is typically a single .cpp or .c file compiled into a COFF object file. This can lead to difficulties with code reuse. Multiple c/cpp file support can be simulated through #include of a c file as opposed to the typical header file, but again, this is not the norm for traditional software development practices.  

Since the BOF PE design is a fully linked executable. Multiple c/cpp files can be used along with precompiled static libraries that include common code often used across multiple BOFs. 

Simpler Loader Design

Whilst a fully linked PE and COFF file are both COFF formats, the latter is a little more complex to deal with when loading for execution purposes. COFF files can end up with hundreds of sections as code complexity grows. Some are special. For example, COMDAT sections can be duplicated, and it’s the linkers job to pick just one.  Flags for the section will determine how one of those duplicates is chosen. Fully compatible linkers will deal with the various complexities as expected and discard and optimize unreferenced sections.

Current C2 COFF loaders do not handle these edge cases very well. This can often lead to unresolved symbols at the time of execution. With BOF PE, all symbols will be resolved at compile time, therefore any truly unresolved symbols can be resolved during compilation and linkers will correctly resolve internal symbols as expected. 

But What About BOF PE Size?

I already hear the voices of the true purists that love to write their BOF code in native assembly language so that their compiled object file is 100 bytes less than the C equivalent. 

Fear not, the reference design includes three sample PE files. 

NameDescriptionRelease Size
tiny-peA bare bones BOF PE that has no dependencies on the c runtime library at all~3KB
c-peTypical Hello World C PE that links to the C runtime statically~120KB
cpp-peA C++ Hello World PE that uses the C++ STL library also throws and catches exception~400KB

If overall size is important, the tiny-pe template is of similar size to a traditional Hello World BOF. On the flip side, the cpp-pe is considerably larger, but includes the flexibility of using the C++ STL library, exceptions, etc. 

I know which I prefer, but you do you. Either way, I hope the design is flexible enough to support the true purist or those who prefer to use the more feature rich capabilities of modern C++. 

Show Me The Money

I have released a reference design that includes a proof of concept loader that can be used as a starting point for anyone that wishes to implement BOF PE support within their C2 framework. It’s by no means complete as further work is needed to support exceptions on Windows 7/2008 or below, but hopefully a good starting point nonetheless. 

NetSPI BOF PE Design

References