/obfy

A tiny C++ obfuscation framework

Primary LanguageC++MIT LicenseMIT

Build Status

Attacking the licensing problems with C++

From the early days of the commercialization of computer software, malicious programmers, also known as crackers have been continuously nettling the programmers of aforementioned software by constantly bypassing the clever licensing mechanisms they have implemented in their software, thus causing financial damages to the companies providing the software.

This trend has not changed in recent years, the more clever routines the programmers write, the more time is spent by the crackers to invalidate the newly created routines, and at the end the crackers always succeed. For the companies to be able to keep up with the constant pressure provided by the cracking community they would need to constantly change the licensing and identification algorithms, but in practice this is not a feasible way to deal with the problem.

An entire industry has evolved around software protection and licensing technologies, where renowned companies offer advanced (and expensive) solutions to tackle this problem. The protection schemes vary from using various resources such as hardware dongles, to network activation, from unique license keys to using complex encryption of personalized data, the list is long.

This article will provide a short introduction to illustrate a very simple and naive licensing algorithms' internal workings, we will show how to bypass it in an almost real life scenario, and finally present a software based approach to mitigate the real problem by hiding the license checking code in a layer of obfuscated operations generated by the C++ template metaprogramming framework which will make the life of the person wanting to crack the application a little bit harder. Certainly, if they are well determined, the code will also be cracked at some point, but at least we'll make it harder for them.

A naive licensing algorithm

The naive licensing algorithm is a very simple implementation of checking the validity of a license associated with the name of the user who has purchased the associated software. It is NOT an industrial strength algorithm, it has just demonstrative power, while trying to provide insight on the actual responsibilities of a real licensing algorithm.

Since the license checking code is usually shipped with the software product in compiled form, I'll put in here both the generated code (in Intel x86 assembly) since that is what the crackers will see after a successful disassembly of the executable but also the C++ code for the licensing algorithm. In order to not to pollute the precious paper space with unintelligible binary code I will restrain myself to include only the relevant bits of the code, with regard to the parts which naively determines whether a supplied license is valid or not, together with the C++ code, which was used to generate the binary code.

The following is the source code of the licensing algorithm:

static const char letters[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
bool check_license(const char* user, const char* users_license)
{
    std::string license;
    size_t ll = strlen(users_license);
    size_t l = strlen(user), lic_ctr = 0;
    int add = 0;

    for (size_t i = 0; i < ll; i++)
        if (users_license[i] != '-')
            license += users_license[i];

    while (lic_ctr < license.length() ) {
        size_t i = lic_ctr;
        i %= l;
        int current = 0;
        while (i < l) current += user[i ++];
        current += add;
        add++;
        if (license[lic_ctr] != letters[current % sizeof letters])
            return false;
        lic_ctr++;
    }
    return true;
}

The license which this method validates comes in the form of the following "ABCD-EFGH-IJKL-MNOP" and there is an associated generate_license method which will be presented as an Appendix for this article.

Also, the naivety of this method is easily exposed by using the very proper name of check_license which immediately reveals to the up-to-be attacker where to look for the code checking the ... license. If you want to make harder for the attacker the identification of the license checking method I'd recommend either to use some irrelevant names or just strip all symbols from the executable as part of the release process.

The interesting part is the binary code of the method obtained via compilation of the corresponding C++ code (which we obtained by compiling it with Microsoft Visual C++ 2015). I have compiled it in Release mode (with Debug information included for educational purposes) but it is intentionally NOT the Debug version, since we hardly should ship debug version of the code to our customers.

I also have used the built in debugger of the VS IDE, to visualize the generated code next to the source, in order to facilitate the a better understanding of the relation between these two.

if (license[lic_ctr] != letters[current % sizeof letters])
    00FC15E4  lea         ecx,[license]  
    00FC15E7  cmovae      ecx,dword ptr [license]  
    00FC15EB  xor         edx,edx  
    00FC15ED  push        1Bh  
    00FC15EF  pop         esi  
    00FC15F0  div         eax,esi  
    00FC15F2  mov         eax,dword ptr [lic_ctr]  
    00FC15F5  mov         al,byte ptr [ecx+eax]  
    00FC15F8  cmp         al,byte ptr [edx+0FC42A4h]  
    00FC15FE  jne         check_license+0DEh (0FC1625h)  
    return false;
lic_ctr++;
    00FC1600  mov         eax,dword ptr [lic_ctr]  
    00FC1603  mov         ecx,dword ptr [add]  
    00FC1606  inc         eax  
    00FC1607  mov         dword ptr [lic_ctr],eax  
    00FC160A  cmp         eax,dword ptr [ebp-18h]  
    00FC160D  jb          check_license+7Fh (0FC15C6h)  
}
return true;
    00FC160F  mov         bl,1  
    00FC1611  push        0  
    00FC1613  push        1  
    00FC1615  lea         ecx,[license]  
    00FC1618  call        std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Tidy (0FC1944h)  
    00FC161D  mov         al,bl  
}
    00FC161F  call        _EH_epilog3_GS (0FC2F7Ch)  
    00FC1624  ret  
    00FC1625  xor         bl,bl  
    00FC1627  jmp         check_license+0CAh (0FC1611h)

Let's analyze it for a short while. The essence of the validity checking happens at the address 00FC15F8 where the comparison cmp al, byte ptr [edx+0FC42A4h] takes place (for those wondering, edx gets its value as being the remainder of the division at 00FC15F0).

At this stage the value of the al register is already initialized with the value of license[lic_ctr] and that is the actual comparison to see that it matches the actually expected character. If it does not match, the code jumps to 0FC1625h where the bl register is zeroed out (xor bl, bl) and from there the jump goes backward to 0FC1611h to leave the method with the ret instruction found at 00FC1624. Otherwise the loop continues.

The most common way of returning a value from a method call is to place the value in the eax register and let the calling code handle it, so before returning from the method the value of al is populated with the value of the bl register (via mov al, bl found at 00FC161D).

Please remember, that if the check discussed before did not succeed, the value of the bl register was 0, but this bl was initialized to 1 (via mov bl,1 at 00FC160F) in case the entire loop was successfully completed.

If we think from the perspective of an attacker, the only thing that needs to be done is to replace in the executable the binary sequence of xor bl,bl with the binary code of mov bl,1. Since luckily these two have the same length (2 bytes) the crack is ready to be published within a few seconds.

Moreover, due to the simplicity of the implementation of the algorithm, a highly skilled cracker could easily create a key-generator for the application, which would be an even worse scenario, since the cracker didn't had to to modify the executable, thus further safety steps, such as integrity checks of the application would all be executed correctly, but there would be a publicly available key-generator which could be used by anyone to generate a license-key without ever paying for it, or malicious salesmen could generate counterfeit licenses which they could sell to unsuspecting customers.

Here comes in picture our C++ Obfuscating framework.

The C++ Obfuscating framework

The C++ obfuscating framework provides a simple macro based mechanism combined with advanced C++ template meta-programming techniques for relevant methods and control structures to replace the basic C++ control structures and statements with highly obfuscated code which makes the reverse engineering of the product a complex and complicated procedure.

By using the framework the reverse engineering of the license checking algorithm presented in the previous paragraph would prove to be a highly challenging task due to the sheer amount of extra code generated by the frameworks engine.

The framework has adopted a familiar, BASIC like syntax to make the switch from real C++ source code the the macro language of the framework as easy and painless as possible.

Functionality of the framework

The role of the obfuscating framework is to generate extra code, while providing functionality which is expected by the user, with as little as possible syntax changes to the language as could be achieved.

The following functionalities are provided by the framework:

  • wrap all values into a valueholder class thus hiding them from immediate access
  • providing a BASIC like syntax for the basic c++ control structures (if, for, while ...)
  • generating extra code to achieve complex code making it harder to understand
  • offering a randomization of constant values in order to hide the information

Debugging with the framework

Like every developer who has been there, we know that debugging complex and highly templated c++ code sometimes can be a nightmare. In order to avoid this nightmare while using the framework we decided to implement a debugging mode.

In order to activate the debugging mode of the framework define the OBF_DEBUG identifier before including the obfuscation header file. Please see at the specific control structures how the debugging mode alters the behaviour of the macro.

Using the framework

The basic usage of the framework boils down to including the header file providing the obfuscating functionality

#include "instr.h"`

then using the macro pair OBF_BEGIN and OBF_END as delimiters of the code sequences that will be using obfuscated expressions.

For a more under the hood view of the framework, the OBF_BEGIN and OBF_END macros declare a try-catch block, which has support for returning values from the obfuscated current code sequence, and also provides support for basic control flow modifications such as the usage of continue and break emulator macros CONTINUE and BREAK.

Behind the scenes: OBF_BEGIN and OBF_END

OBF_BEGIN expands to:

#define OBF_BEGIN try { obf::next_step __crv = obf::next_step::ns_done; std::shared_ptr<obf::base_rvholder> __rvlocal;

and OBF_END becomes:

#define OBF_END } catch(std::shared_ptr<obf::base_rvholder>& r) { return *r; } catch (...) {throw;}

In order to support for "return"-ing a value from the current obfuscated block we need a special variable __rvlocal. At later stages this value will be populated with meaningful values as a result of executing the code of the RETURN macro (which will "throw" a value with type of std::shared_ptr<obf::base_rvholder>). The OBF_END will catch this specific value and handle it appropriately, while all other values thrown will be re-thrown in order to not to disturb the client code's exception handling.

Value and numerical wrappers

To achieve an extra layer of obfuscation, the integral numerical values can be wrapped in the macro N() and all integral numeric variables (int, long, ...) can be wrapped in the macro V() to provide an extra layer of obfuscation for doing the calculation operations. The V() value wrapper also can wrap individual array elements(x[2]), but not arrays (x) and also cannot wrap class instantiation values due to the fact that the macro expands to a reference holder object.

The implementation of the wrappers uses the link time random number generator provided by [Andrivet] and the values are obfuscated by performing various operations to hide the original value.

And here is an example for using the value and variable wrappers:

int a, b = N(6);
V(a) = N(1);

After executing the statement above, the value of a will be 1.

The value wrappers implement a limited set of operations which you can use to change the value of the wrapped variable. These are the compound assignment operators: +=, -=, *=, /=, %=, <<=, >>=, &=, |=, ^= and the post/pre-increment operations -- and ++. All of the binary operators +, -, *, /, %, &, |, <<, >> are also implemented so you can write V(a) + N(1) or V(a) - V(b).

Also, the assignment operator to a specific type and from a different value wrapper is implemented, together with the comparison operators.

As the name implies, the value wrappers will wrap values by offering a behaviour similar to the usage of simple values, so be aware, that variables which are const values can be wrapped into the V() wrapper however as with real const variables, you cannot assign to them. So for example the following code will not compile:

    const char* t = "ABC";
    if( V(t[1]) == 'B')
    {
        V( t[1] ) = 'D';
    }

And the following

    char* t = "ABC";
    if( V(t[1]) == 'B')
    {
        V( t[1] ) = 'D';
    }

will be undefined behaviour because the compiler highly probably will allocate the string "ABC" in a constant memory area (although I would expect your compiler to choke heavily on this expression since it's not valid modern C++ anymore). To work with this kind of data always use char[] instead of char*.

Behind the scenes of the implementation of the numeric wrapping

The N macro is defined like the following:

#define N(a) (obf::Num<decltype(a), obf::MetaRandom<__COUNTER__, 4096>::value ^ a>().get() ^ obf::MetaRandom<__COUNTER__ - 1, 4096>::value)

As a first step let's consider that due to the implementation of [Andrivet] and the (more or less standard) __COUNTER__ macro the obf::MetaRandom<__COUNTER__, 4096>::value and obf::MetaRandom<__COUNTER__ - 1, 4096>::value) will have the same value.

Now, taking the obf::Num class in the visor:

template<typename T, T n> class Num final
{
public:
    enum { value = ( (n & 0x01)  | ( Num < T , (n >> 1)>::value << 1) ) };
    Num() : v(0)
    {
        v = value ^  MetaRandom<32, 4096>::value;
    }
    T get() const { volatile T x = v ^ MetaRandom<32, 4096>::value; return x;}
private:
    volatile T v;
};

Where the iteration of the templates is finalized by:

struct ObfZero { enum {value = 0}; };
struct ObfOne { enum {value = 1}; };
#define OBF_ZERO(t) template <> struct Num<t,0> final : public ObfZero { t v = value; };
#define OBF_ONE(t) template <> struct Num<t,1> final : public ObfOne { t v = value; };
#define OBF_TYPE(t) OBF_ZERO(t) OBF_ONE(t)
OBF_TYPE(int) // And for all the other integral types

The Num class tries to add some protection by adding some extra xor operations to the usage of a simple number, thus turning a simple numeric assignment into several steps of assembly code (Visual Studio 2015 generated the following code in Release With Debug Info mode):

    int n;
    OBF_BEGIN
       n = N(42);
002A5F74  mov         dword ptr [ebp-4],0  
002A5F7B  mov         dword ptr [ebp-4],78Ch  
002A5F82  mov         eax,dword ptr [ebp-4]  
002A5F85  xor         eax,0E8Fh  
002A5F8A  mov         dword ptr [ebp-4],eax  
002A5F8D  mov         eax,dword ptr [ebp-4]  
002A5F90  xor         eax,929h  
    OBF_END

However, please note the several volatile variables ... which are required in order to circumvent todays' extremely clever optimizing compilers. If we remove the volatile from the variables, the compiler is clever enough to guess the value I wanted to obfuscate, so ... there goes the obfuscation.

Behind the scenes of the implementation of the variable wrapping

In case of not building the code in debugging mode, the macro V expands to the following C++ nightmare:

#define MAX_BOGUS_IMPLEMENTATIONS 3

#define V(a) ([&]() {obf::extra_chooser<std::remove_reference<decltype(a)>::type, obf::MetaRandom<__COUNTER__, \
            MAX_BOGUS_IMPLEMENTATIONS>::value >::type _JOIN(_ec_,__COUNTER__)(a);\
            return obf::stream_helper();}() << a)

So let's dissect it in order to understand the underlying operations.

The value wrappers add an extra obfuscation layer to the values they wrap, by performing an extra addition, an extra substraction or an extra xor operation on the value itself. This is picked randomly when compilation happens by the extra_chooser class, which is like:

template <typename T, int N>
class extra_chooser
{
    using type = basic_extra;
};

And is helped by the following constructs:

#define DEFINE_EXTRA(N,implementer) template <typename T> struct extra_chooser<T,N> { using type = implementer<T>; }

DEFINE_EXTRA(0, extra_xor);
DEFINE_EXTRA(1, extra_substraction);
DEFINE_EXTRA(2, extra_addition);

Which is the actual definition of the classes for the extra operations, which in their turn look like:

template <class T>
class extra_xor final : public basic_extra
{
public:
    extra_xor(T& a) : v(a)
    {
        volatile T lv = MetaRandom<__COUNTER__, 4096>::value;
        v ^= lv;
    }
    virtual ~extra_xor() 
    {
        volatile T lv = MetaRandom<__COUNTER__ - 1, 4096>::value;
        v ^= lv; 
    }
private:
    volatile T& v;
};

Where the extra addition and substraction are also very similar.

The next thing we observe is that an object of this kind (ie. extra bogus operation chooser) is defined in a lambda function for the variable we are wrapping. The variable name for this is determined by _JOIN(_ec_,__COUNTER__)(a), where _JOIN is just a simple joiner macro:

#define _JOIN(a,b) a##b

Upon creation and destruction of this extra_chooser object the value of the object will remain unchanged, however extra code will be generated by the compiler (thanks to the numerous volatile modifiers found in the extra operation classes, otherwise the compiler would "cheat" again and just "skip" our obfuscation). This is actually an extensible interface, so if you define your own class for bogus operation and use the DEFINE_EXTRA macro (and increase the MAX_BOGUS_IMPLEMENTATIONS) you can use it too.

Now, back to the lambda, because it plays an important role. The lambda returns an object of type obf::stream_helper() which is basically an empty class (class stream_helper {};), but the role of the lambda is still not done. As we can see in the macro, the lambda is executed and into its result (ie. the obf::stream_helper() object) we stream in the parameter of the macro (<< a). This gives the control to the following operator:

template <typename T>
refholder<T> operator << (stream_helper, T& a)
{
    return refholder<T>(a);
}

providing us with a controversary class, refholder:

template <typename T>
class refholder final
{
public:
    refholder() = delete;
    refholder(T& pv) : v(pv) {}
    refholder(T&&) = delete;

    ~refholder() = default;

    refholder<T>& operator = (const T& ov) { v = ov; return *this;}
    refholder<T>& operator = (const refholder<T>& ov ) { v = ov.v; return *this; }

    bool operator == (const T& ov) { return !(v ^ ov); }
    bool operator != (const T& ov) { return !operator ==(ov); }
    COMPARISON_OPERATOR(>=)
    COMPARISON_OPERATOR(<=)
    COMPARISON_OPERATOR(>)
    COMPARISON_OPERATOR(<)

    operator T() {return v;}

    refholder<T>& operator++() { ++ v; return *this; }
    refholder<T>& operator--() { -- v; return *this; }

    refholder<T> operator++(int) { refholder<T> rv(*this); operator ++(); return rv; }
    refholder<T> operator--(int) { refholder<T> rv(*this); operator --(); return rv; }

    COMP_ASSIGNMENT_OPERATOR(+)
    COMP_ASSIGNMENT_OPERATOR(-)
    COMP_ASSIGNMENT_OPERATOR(*)
    COMP_ASSIGNMENT_OPERATOR(/)
    COMP_ASSIGNMENT_OPERATOR(%)
    COMP_ASSIGNMENT_OPERATOR(<<)
    COMP_ASSIGNMENT_OPERATOR(>>)
    COMP_ASSIGNMENT_OPERATOR(&)
    COMP_ASSIGNMENT_OPERATOR(|)
    COMP_ASSIGNMENT_OPERATOR(^)

private:
    volatile T& v;
};

This class has all the support for the basic operations you can execute on a variable either via the member operators (defined explicitly or via the macro COMP_ASSIGNMENT_OPERATOR) either defined via the DEFINE_BINARY_OPERATOR macro which defines binary operators for refholder classes. In case the variable wrapping is done on constant variables there are specializations of this template class for constant T's. There are various reasons against the construct of storing references as class member [Stackoverflow] however I consider this situation of being a reasonably safe one which can be exploited for this specific reason.

So, here comes a piece of generated assembly code for a very simple expression:

    int n;
    OBF_BEGIN
        V(n) = N(42);
00048466  mov         dword ptr [ebp-8],0  
0004846D  mov         dword ptr [ebp-8],97Ch  
00048474  push        esi  
00048475  mov         esi,dword ptr [ebp-8]  
00048478  mov         dword ptr [ebp-8],48Bh  
0004847F  xor         esi,0DC4h  
00048485  mov         eax,dword ptr [ebp-8]  
00048488  add         eax,dword ptr [n]  
0004848B  mov         dword ptr [n],eax  
0004848E  mov         dword ptr [ebp-8],48Bh  
00048495  mov         eax,dword ptr [ebp-8]  
00048498  sub         dword ptr [n],eax  
0004849B  lea         eax,[n]  
0004849E  push        eax  
0004849F  push        dword ptr [ebp-8]  
000484A2  lea         eax,[ebp-0Ch]  
000484A5  push        eax  
000484A6  call        obf::operator<<<int> (0414C9h)  
000484AB  add         esp,0Ch  
000484AE  xor         esi,492h  
000484B4  mov         eax,dword ptr [eax]  
000484B6  mov         dword ptr [eax],esi  
    OBF_END

The sheer amount of extra code generated for a simple assignment is simply overwhelming.

Control structures of the framework

The basic control structures which are familiar from C++ are made available for immediate use by the developers by means of macros, which expand into complex templated code.

They are meant to provide the same functionality as the standard c++ keyword they are emulating, and if the framework is compiled in DEBUG mode, most of them actually expand to the c++ control structure itself.

Decision making

When there is a need in the application to take a decision based on the value of a specific expression, the obfuscated framework offers the familiar if-then-else statement for the developers in the form of the IF-ELSE-ENDIF construct.

The IF statement

For checking the true-ness of an expression the framework offers the IF macro which has the following form:

IF (expression)
....statements
ELSE
....other statements
ENDIF

where the ELSE is not mandatory, but the ENDIF is, since it indicates the end of the IF blocks' statements.

And here is an example for the usage of the IF macro.

IF( V(a) == N(9) )
     V(b) = a + N(5);
ELSE
     V(a) = N(9);
     V(b) = a + b;
ENDIF

Due to the way the IF macro is defined, it is not required to create a new scope between the IF and ENDIF, it is automatically defined and all variables declared in the statements between IF and ENDIF are destroyed.

Since the evaluation of the expression is bound to the execution of a hidden (well at least from the outer world) lambda unfortunately it is not possible to declare variables in the expression so the following expression:

IF( int x = some_function() )

is not valid, and will yield a compiler error. This is partially intentional, since it gives that extra layer of obfuscation required to hide the operations done on a variable in a nameless lambda somewhere deep in the code.

In case the debugging mode is active, the IF-ELSE-ENDIF macros are defined to expand to the following statements:

#define IF(x)  if(x) {
#define ELSE   } else {
#define ENDIF  }
Implementation of the IF construct

The IF macro expands to the following:

#define IF(x) {std::shared_ptr<obf::base_rvholder> __rvlocal; obf::if_wrapper(( [&]()->bool{ return (x); })).set_then( [&]() {

the ELSE macro exopands to:

#define ELSE return __crv;}).set_else( [&]() {

and the ENDIF will give:

#define ENDIF return __crv;}).run(); }

so to wrap up all, the following code:

IF( n == 42)
    n = 43;
ELSE
    n = 44;
ENDIF

will expand to

{
    std::shared_ptr<obf::base_rvholder> __rvlocal; 
    obf::if_wrapper( ([&]()->bool
    { 
        return (n == 42); 
    }) )
    .set_then( [&]() 
    {
        n = 43;
        return __crv;
    })
    .set_else( [&]() 
    {
        n = 44;
        return __crv;
    })
    .run(); 
}

Now let's examine the if_wrapper class.

class if_wrapper final
{
public:
    template<class T>
    if_wrapper(T lambda) {condition.reset(new bool_functor<T>(lambda));}

    void run()
    {
        if(condition->run()) { if(thens) {
            thens->run();
        }}
        else { if(elses) {
            elses->run();
        }}
    }

    ~if_wrapper() noexcept = default;

    template<class T>
    if_wrapper& set_then(T lambda) 
    { 
        thens.reset(new next_step_functor<T>(lambda)); return *this; 
    }

    template<class T>
    if_wrapper& set_else(T lambda) 
    { 
        elses.reset(new next_step_functor<T>(lambda)); return *this; 
    }

private:
    std::unique_ptr<bool_functor_base> condition;
    std::unique_ptr<next_step_functor_base> thens;
    std::unique_ptr<next_step_functor_base> elses;
};

Now it is very clear why we needed the lambda created by the IF macro (([&]()->bool { return (n == 42); })). Because we needed to create an object of type class bool_functor from it, which will give us the true-ness of the if condition. The bool functor class looks like:

struct bool_functor_base
{
    virtual bool run() = 0;
};

template <class T>
struct bool_functor final : public bool_functor_base
{
    bool_functor(T r) : runner(r) {}
    virtual bool run() {return runner();}

private:
    T runner;
};

Where the important part is the bool run() which in fact runs the condition and returns its true-ness.

The two branches of the if are represented by the member variables std::unique_ptr<next_step_functor_base> thens; std::unique_ptr<next_step_functor_base> elses; and they behave very similar to the condition.

the run() method of the if_wrapper class firstly checks the condition and then depending on the presence of the then and else branches executes the required operations.

Support for looping

There is a time when every application needs to iterate over a set of values, so I tried to re-implement the basic loop structures used in c++: The for loop, the while and the do-while have been reincarnated in the framework.

The FOR statement

The macro provided to imitate the for statement is:

FOR(initializer, condition, incrementer)
.... statements
ENDFOR`

Please note, that since FOR is a macro, it should use , (comma) not the traditional ; which is used in the standard C++ for loops, and do not forget to include your initializer, condition and incrementer in parentheses if they are expressions which have , (comma) in them.

The FOR loops should be ended with and ENDFOR statement to signal the end of the structure.

Here is a simple example for the FOR loop.

FOR(V(a) = N(0), V(a) < N(10), V(a) += 1)
   std::cout << V(a) << std::endl;
ENDFOR

The same restriction concerning the variable declaration in the initializer as in the case of the IF applies for the FOR macro too, so it is not valid to write:

FOR(int x=0, x<10, x++)

and the reasons are again the same as presented above.

In case of a debugging session the FOR-ENDFOR macros expand to the following:

#define FOR(init,cond,inc) for(init;cond;inc) {
#define ENDFOR }
The WHILE loop

The macro provided as replacement for the while is:

WHILE(condition)
....statements
ENDWHILE

The while loop has the same characteristics as the IF construct and behaves the same way as you would expect from a well-mannered while statement: it checks the condition on the top, and executes the repeatedly the statements as long as the given condition is true.

Here is an example for the WHILE:

    V(a) = 1;
    WHILE( V(a)  < N(10) )
        std::cout << "IN:" << a<< std::endl;
        V(a) += N(1);
    ENDWHILE

Unfortunately the WHILE loop also has the same restrictions as the IF: you cannot declare a variable in its condition.

In case the compilation is done in debugging mode, the WHILE evaluates to:

#define WHILE(x) while(x) {
#define ENDWHILE }
The REPEAT - AS_LONG_AS construct posing as do - while

Due to the complexity of the solution, the familiar do - while construct of the C++ language had to be renamed a bit, since the WHILE "keyword" was already taken for the benefit of the while loop, so I created the REPEAT - AS_LONG_AS keywords to achieve this goal.

This is the syntax of the REPEAT - AS_LONG_AS construct:

REPEAT
....statements
AS_LONG_AS( expression )

This will execute the statements at least once, and then depending on the value of the expression either will continue the execution, or will stop and exit the loop. If the expression is true it will continue the execution from the beginning of the loop, if it is false it will stop the execution and exit the loop.

And here is an example:

REPEAT
    std::cout << a << std::endl;
    ++ V(a);
AS_LONG_AS( V(a) != N(12) )

In case of debugging, the REPEAT - AS_LONG_AS construct expands to the following:

#define REPEAT   do {
#define AS_LONG_AS(x) } while (x);
Implementation of the looping constructs

The logic and design of looping constructs are very similar to each other, they behave very similarly to the IF and each of them uses the same building blocks. There are the wrapper classes (for_wrapper, repeat_wrapper, while_wrapper) each of them with their functors for verifying the condition, and the steps to be executed.

The implementation in each of the run() method of the wrapper class follows the logic of the keyword it tries to emulate, with the exception that the commands are wrapped into a try - catch in order for BREAK and CONTINUE to function properly. Let's see for example the run() of the for wrapper:

void run()
{
    for( initializer->run(); condition->run(); increment->run())
    {
        try
        {
            next_step c = body->run();
        }
        catch(next_step& c)
        {
            if(c == next_step::ns_break) break;
            if(c == next_step::ns_continue) continue;
        }
    }
}

Altering the control flow of the application

Sometimes there is a need to alter the execution flow of a loop, C++ has support for this operation by providing the continue and break statements. The framework offers the CONTINUE and BREAK macros to achieve this goal.

The CONTINUE statement

The CONTINUE statement will skip all statements that follow him in the body of the loop, thus altering the flow of the application.

Here is an example for the CONTINUE used in a FOR loop:

FOR(a = 0, a < 5, a++)
   std::cout << "counter before=" << a << std::endl;
   IF(a == 2)
        CONTINUE
   ENDIF
   std::cout << "counter after=" << a << std::endl;
ENDFOR

and the equivalent WHILE loop:

a = 0;
WHILE(a < 5)
    std::cout << "counter before=" << a << std::endl;
    IF(a == 2)
         a++;
         CONTINUE
    ENDIF
    std::cout << "counter after=" << a << std::endl;
    a++;
ENDFOR

Neither of these should print out the counter after=2 text.

The BREAK statement

The BREAK statement terminates the loop statement it resides in and transfers execution to the statement immediately following the loop.

Here is an example for the BREAK statement used in a FOR loop:

FOR(a = 0, a < 10, a++)
   std::cout << "counter=" << a << std::endl;
   IF(a == 1)
        BREAK
   ENDIF
ENDFOR

This loop will print counter=0 and counter=1 then it will leave the body of the loop, continuing the execution after the ENDFOR.

The RETURN statement

As expected, the RETURN statement returns the execution of the current function and will return the specified value to the caller function. Here is an example of returning 42 from a function:

int some_fun()
{
    OBF_BEGIN

        RETURN(42)

    OBF_END
}

With the introduction of RETURN, an important issue arose: The obfuscation framework does not support the usage of void functions. So the following code will not compile:

void void_test(int& a)
{
    OBF_BEGIN
        IF(V(a) == 42)
            V(a) = 43;
        ENDIF
    OBF_END
}

This is a seemingly annoying feature, but it easily can be fixed by simply changing the return type of the function to any non-void type. The reason is that the RETURN macro and the underlying C++ constructs should handle a wide variety of returnable types in a manner which can be handled easily by the programmer without causing confusion.

Implementation of CONTINUE, BREAK and RETURN

These keywords give the following when not compiled in debug mode:

#define BREAK __crv = obf::next_step::ns_break; throw __crv;
#define CONTINUE __crv = obf::next_step::ns_continue; throw __crv;

#define RETURN(x) __rvlocal.reset(new obf::rvholder<std::remove_reference<decltype(x)>::type>(x,x));  throw __rvlocal;

BREAK and CONTINUE offer no surprises in the implementation and they comply to the expectation that has been formulated in the looping constructs: they throw a specific value, which is then caught in the local loop of the implementation, which handles it accordingly.

However RETURN is a different kind of beast.

It initializes the __rvlocal (ie: local return value) to the returned value and then throws it for the catch which is to be found in the OBF_END macro, which in its turn handles it correctly.

As you can see, there are three evaluations of the x macro parameter, in order to avoid unwanted behaviour from your application do not use expressions which might turn out to be dangerous, such as: RETURN (x++); which will give a three times increment to your variable and an undefined behaviour.

The rvholder class has the following body:

struct base_rvholder
{
    virtual ~base_rvholder() = default;

    template<class T>
    operator T () const
    {
        return *reinterpret_cast<const T*>(get());
    }
    template<class T>
    bool operator == (const T& o) const
    {
        return o == operator T ();
    }
    template<class T>
    bool equals(const T& o) const
    {
        return o == *reinterpret_cast<const T*>(get());
    }
    virtual const void* get() const = 0;
};

template<class T>
class rvholder : public base_rvholder
{
public:
    rvholder(T t, T c) :base_rvholder(), v(t), check(c) {}
    ~rvholder() = default;
    virtual const void* get() const override 
    {
        return reinterpret_cast<const void*>(&v);
    }
private:
    T v;
    T check;
};

As you can see there is a redundant equals method in the base class, and this is due to the fact that during development of the framework, the Visual Studio compiler constantly crashed due to some internal error in the implementation of the CASE construct, and it always reported the error in the operator == of the base class. In order to make it work I have added the extra equals member.

The CASE statement

When programming in c++ the switch-case statement comes handy when there is a need to avoid long chains of if statements. The obfuscation framework provides a similar construct, although not exactly a functional and syntactical copy of the original switch-case construct.

Here is the CASE statement:

CASE (<variable>)
    WHEN(<value>) [OR WHEN(<other_value>)] DO
    ....statements
    ....[BREAK]
    DONE
    [DEFAULT
    ....statements
    DONE]
ENDCASE

The functionality is very similar to the well known switch-case construct, the main differences are:

  1. It is possible to use non-numeric, non-constant values (variables and strings) for the WHEN due to the fact that all of the CASE statement is wrapped up in a templated, lambdaized well hidden from the outside world, construct. Be careful with this extra feature when using the debugging mode of the library because the CASE macro expands to the standard case keyword.
  2. It is possible to have multiple conditions for a WHEN label joined together with OR.

The fall through behaviour of the switch construct which is familiar to c++ programmers was kept, so there is a need to put in a BREAK statement if you wish for the operation to stop after entering a branch.

And here is an example for the CASE statement:

    std::string something = "D";
    std::string something_else = "D";
    CASE (something)
        WHEN("A") OR WHEN("B") DO
            std::cout <<"Hurra, something is " << something << std::endl;
            BREAK;
        DONE
        WHEN("C") DO
            std::cout <<"Too bad, something is " << something << std::endl;
            BREAK;
        DONE
        WHEN(something_else) DO
            std::cout <<"Interesting, something is " << something_else << std::endl;
            BREAK;
        DONE
        DEFAULT
            std::cout << "something is neither A, B or C, but:" << something <<std::endl;
        DONE
    ENDCASE

In case the framework is used in debugging mode the macros expand to the following statements:

#define CASE(a) switch (a) {
#define ENDCASE }
#define WHEN(c) case c:
#define DO {
#define DONE }
#define OR
#define DEFAULT default:
Implementation of the CASE construct

Certainly, the most complex of all constructs is the CASE one. Just the amount of macros supporting it is huge:

#define CASE(a) try { std::shared_ptr<obf::base_rvholder> __rvlocal;\
                auto __avholder = a; obf::case_wrapper<std::remove_reference<decltype(a)>::type>(a).
#define ENDCASE run(); } catch(obf::next_step& cv) {}
#define WHEN(c) add_entry(obf::branch<std::remove_reference<decltype(__avholder)>::type>\
                ( [&,__avholder]() -> std::remove_reference<decltype(__avholder)>::type \
                { std::remove_reference<decltype(__avholder)>::type __c = (c); return __c;} )).
#define DO add_entry( obf::body([&](){
#define DONE return obf::next_step::ns_continue;})).
#define OR join().
#define DEFAULT add_default(obf::body([&](){

Let's dive into it.

The case_wrapper name should be already familiar from the various wrappers, but for the CASE the real workhorse is the case_wrapper_base class. The case_wrapper class is necesarry in order to make possible the CASE selection on const or non const objects, so the case_wrapper classes just derives from case_wrapper_base and specializes on the constness of the CASE expression. Please note that the CASE macro also evaluates more than once the a parameters, so writing CASE(x++) will lead to undefined behaviour.

The case_wrapper_base class looks like:

template <class CT>
class case_wrapper_base
{
public:
    explicit case_wrapper_base(const CT& v) : check(v), default_step(nullptr) {}
    case_wrapper_base& add_entry(const case_instruction& lambda_holder) {
        steps.push_back(&lambda_holder);
        return *this;
    }
    case_wrapper_base& add_default(const case_instruction& lambda_holder) {
        default_step = &lambda_holder;
        return *this;
    }
    case_wrapper_base& join() {
        return *this;
    }
    void run() const ; // body extracted from here, See later in the article for the description of it
private:
    std::vector<const case_instruction*> steps;
    const CT check;
    const case_instruction* default_step;
};

The const CT check; is the expression that is being checked for the various case branches. Please note the add_entry and add_default methods, together with the join() method which allow chaining of expressions and method calls on the same object. The std::vector<const case_instruction*> steps; is a cumulative container for all the branch condition expressions and also bodies (code which is executed in a branch). This will introduce more complex code at a later stage, however it was necessary to have these two joined in the same container in order to allow as similar behaviour to the original way the C++ case works, as possible.

The inner mechanism of the CASE depends on the following classes:

  1. The obf::case_instruction class, which acts as a basic class for:
  2. obf::branch and
  3. obf::body classes.

The obf::branch class is the class which gets instantiated by the WHEN macro in a call to the add_entry method of the case_wrapper object created by the CASE. Its role is to act as the condition chooser, and it looks like:

template<class CT>
class branch final : public case_instruction
{
public:
    template<class T>
    branch(T lambda) 
    {
        condition.reset(new any_functor<T>(lambda));
    }
    bool equals(const base_rvholder& rv, CT lv) const
    {
        return rv.equals(lv);
    }
    virtual next_step execute(const base_rvholder& against) const override
    {
        CT retv;
        condition->run(const_cast<void*>(reinterpret_cast<const void*>(&retv)));
        return equals(against,retv) ? next_step::ns_done : next_step::ns_continue;
    }
private:
    std::unique_ptr<any_functor_base> condition;
};

The WHEN macro has a more or less confusing lambda declaration which includes the local __avholder as being passed in by value. This is again due to the fact that various compilers decided to not to compile the same source code in the same way... well, some of them had a coup and bluntly declined to compile what the others already digested, that's why the ugly solution came into the existence.

The code that is executed upon entering a branch (including also the default branch) is created by the DO and the DEFAULT macros. They both create an instance of the obf::body class, and the DO adds it to the steps of the case wrapper class, and the DEFAULT calls the add_default member in order to specify a default branch. The obf::body class is much simpler, just a few lines:

class body final : public case_instruction
{
public:
    template<class T>
    body(T lambda) 
    {
        instructions.reset(new next_step_functor<T>(lambda));
    }
    virtual next_step execute(const base_rvholder&) const override
    {
        return instructions->run();
    }
private:
    std::unique_ptr<next_step_functor_base> instructions;
};

The most interesting (and longest) part of the case implementation is the run() method, presented here (in a somewhat stripped manner, I have removed all the security checks in order to have presentable code considering its length):

void run() const
{
    auto it = steps.begin();
    while(it != steps.end()) {
        next_step enter = (*it)->execute(rvholder<CT>(check,check));
        if(enter == next_step::ns_continue) {
            ++it;
        }
        else {
            while(! dynamic_cast<const body*>(*it)  && it != steps.end() ) {
                ++it;
            }

            // found the first body.
            while(it != steps.end()) {
                if(dynamic_cast<const body*>(*it)) {
                    (*it)->execute(rvholder<CT>(check,check));
                }
                ++it;
            }
        }
    }

    if(default_step) {
        default_step->execute(rvholder<CT>(check,check));
    }
}

As a first step the code looks for the first branch which satisfies the condition (if (*it)->execute(rvholder<CT>(check,check)); returns next_step::ns_done it means it has found a branch satisfying the check). In this case it skips all the other conditions for this branch and starts execution the code for all the obf::body classes that are in the object. In case a BREAK statement was issued while executing the bodies the code will throw and the catch in ENDCASE (catch(obf::next_step& cv) will swallow it, and will return the execution to the normal flow.

The last resort is that if we have a default_step and we are still in the body of the run (ie: noone issued a BREAK command) it also executes it.

And with this we have presented the entire framework, together with implementation details, and now we are ready to catch up with our initial goal.

The naive licensing algorithm revisited

Now, that we are aware of a library that offers code obfuscation without too much headaches from our side (at least, this was the intention of the author) let's re-consider the implementation of the naive licensing algorithm using these new terms. So here it comes:

bool check_license1(const char* user, const char* users_license)
{
    OBF_BEGIN
    std::string license;
    size_t ll = strlen(users_license);
    size_t l = strlen(user), lic_ctr = N(0);

    size_t add = N(0), i =N(0);

    FOR (V(i) = N(0), V(i) < V(ll), V(i)++)
        IF ( V(users_license[i]) != N(45) )
            license += users_license[i];
        ENDIF
    ENDFOR

    WHILE (V(lic_ctr) < license.length() )

        size_t i = lic_ctr;
        V(i) %= l;
        int current = 0;
        WHILE(V(i) < V(l) )
            V(current) += user[V(i)++];
        ENDWHILE
        V(current) += V(add);
        ++V(add);

        IF ( (license [lic_ctr] != letters[current % sizeof letters]) )
            RETURN(false);
        ENDIF

        lic_ctr++;
    ENDWHILE

    RETURN (true);

    OBF_END
}

Indeed, it looks a little bit more "obfuscated" than the original source, but after compilation it adds a great layer of extra code around the standard logic, and the generated binary is much more cumbersome to understand than the one "before" the obfuscation. And due to the sheer size of the generated assembly code, we simply omit publishing it here.

Discommodities of the framework

Those who dislike the usage of CAPITAL letters in code may find the framework to be annoying. As presented in [Wakely] this almost feels like the code is shouting at you. However, for this particular use case I intentionally made it like this, because of the need to have familiar words that a developer instantly can connect to (because the lower case words are already keywords), and also to subscribe to the C++ rule, that macros should be uppercase.

This brings us back to the swampy area of C++ and macros. There are several voices whispering loudly that macros have nothing to do in a C++ code, and there are several voices echoing back that macros if wisely used can help C++ code as well as good old style C. I personally have nothing against the wise use of macros, indeed they came to be very helpful while developing this framework.

And last, but not least, the numeric value wrappers do not work with floating point numbers. This is due to the fact that extensive binary operations are used on the number to obfuscate its value and this would be impossible to accomplish with floating point values.

Some requirements

The code is written also with "older" compilers in mind, so not all the latest and greatest features of C++14 and 17 are being included. CLang version 3.4.1 happily compiles the source code, so does g++ 4.8.2. Visual Studio 2015 is also compiling the code.

Unit testing is done using the Boost Unit test framework. The build system for the unit tests is CMake and there is support for code coverage (the last two were tested only under Linux).

License and getting the framework

The library is a header only library, released in the public domain under the MIT license.

You can get it from https://github.com/fritzone/obfy

Conclusion

History has shown us, that if a piece of software is crackable, it will be cracked. And it just depends on the dedication, time spent, and effort invested by the software cracker when that piece of a software is to be proven crackable. There is no swiss army knife when it comes about protecting your software against malicious interference, because from the moment it has left your build server and it was dowloaded, the software is out of your hands, and entered an uncontrollable environment. The only sensible act you can do to protect your intellectual property is to make it as hard to crack as possible. This little framework provides a few means in order to achieve this goal, and by making it open source, freely available and modifiable to the developer community we can just hope this will give it an advantage by allowing everyone to tailor it in order to suit their needs best.

Appendix

The license generating algorithm

As promised, here is the naive license generating algorithm. Any further improvements to it are more than welcome.

static const char letters[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

std::string generate_license(const char* user)
{
    if(!user) return "";

    // the license will contain only these character
    // 16 chars + 0
    char result[17] = { 0 };
    size_t l = strlen(user), lic_ctr = 0;
    int add = 0;
    while (lic_ctr < 16)
    {
        size_t i = lic_ctr;
        i %= l;
        int current = 0;
        while (i < l)
        {
            current += user[i];
            i++;
        }
        current += add;
        add++;

        result[lic_ctr] = letters[current % sizeof letters];
        lic_ctr++;
    }

    return std::string(result);
}

References

[Andrivet] - Random Generator by Sebastien Andrivet - https://github.com/andrivet/ADVobfuscator

[Wakely] - Stop the Constant Shouting- Overload Journal #121 - June 2014, Jonathan Wakely

[Stackoverflow] - http://stackoverflow.com/questions/12387239/reference-member-variables-as-class-members