Obfuscated code

HomePage | Recent changes | View source | Discuss this page | Page history | Log in |

Printable version | Disclaimers | Privacy policy

Obfuscated code is source code that is (perhaps intentionally) very hard to read and understand. Some languages are more prone to obfuscation than others. C, C++ and Perl are most often cited as easily obfuscatable languages. Macro preprocessors are often used to create hard to read code by masking the standard language syntax and grammar from the main body of code.

There are also programs known as obfuscators that may operate on source code, object code, or both, for the purpose of deterring reverse engineering.

Recreational obfuscation

Code is sometimes obfuscated deliberately for recreational purposes. There are programming contests which reward the most creatively obfuscated code: The International Obfuscated C Code Contest, Obfuscated Perl Contest and Obfuscated PostScript Contest.

There are many varieties of interesting obfuscations ranging from simple keyword substitution, use/non-use of whitespace to create artistic effects, to clever self-generating or heavily compressed programs.

Short obfuscated Perl programs printing "Just another Perl hacker" or something like that are often found in signatures of Perl programmers. See: Just another Perl hacker.


Take this example from a Usenet signature file:

#include <stdio.h>
#define O (b=b?b-1:(p++,5),*p&1<<b)
#define o O?O
char*p,j=2,b,c;e(n){for(p="|'8I0>+@{=#_P0-]PV.]F>TM!YK'?? |T\"Z8}aE<&D-!:-T'\"\
O<~cG5$,<2'#;/UI.0{d^HV6817-2F95-T7X|c^/1XB]*)3WHG0/0}dN>G RMZB.12.P] ~hM^J\\[\
<R^ (7;)R9A78{gU!:N)E5OPUR><29A6|e&9V;E[Q:,S1.P] }eES.$Z):B.*O+$G_ ~fWU8)75?I#\
75?WHN0{jE=]<V*1]JI#5VK)R9A6~J5X9X#69/+VX4 =S%!X-[)OE #1XRZ\"?~%^-#Dz&M\\RST|%\
G66*~&^HV0> {%^-8_P}%N>FO(}'M^JQ=z&U!:O(J{%&9G4|%ERO(~(WU8)G4{'E=]^G4",b=n;*p++

Although pretty unintelligible at first glance, it is a legal C program which when compiled and run will generate the 12 verses of The 12 Days of Christmas. It is actually a highly specific decompression routine that decodes the encoded strings inlined with the code.

Another example is a program's source listing that was formatted to resemble an empty tic-tac-toe board. Each pass through the program modified the sourcecode to show a turn in the game, to be executed for the next move.

Obfuscation and information-hiding

One definition of "code obfuscation" is a set of transformations on a program, that preserve the same black box specification while making the internals difficult to reverse-engineer. There turns out to be many such transformations.

For example, dynamic languages such as Java, C#, and LISP store a program's symbol table within the compiled output. One common obfuscation is to rename every class from something descriptive like "Encryption_Index", to a meaningless sequence such as "rb". The class methods can be renamed to a(), b(), etc.

When writing source code, programmers generally create a great deal of structure, according to rules from Structured Programming, OOP, and other methodologies. Compilers tend to propagate this structure into compiled code. The job of a good obfuscator is to destroy as much as possible of this structure that lends a program to being human-readable.

Uses for obfuscation

Makes reverse engineering more difficult

Even when a language is compiled to an executable or bytecode file, someone may choose to run a decompiler which converts these files back into human-readable form (generally sans comments). This could help them understand whatever lies hidden within the source code, against the wishes of the code's creator.

Obfuscation serves to increase the difficulty of decompilation, usually forcing someone who wants that information to use more costly forms of reverse engineering.

Minimizes code size

Obfuscation usually breaks down structures which make programs modular and maintainable. This has the pleasant side-effect of reducing code size in many cases. For example, in dynamic languages that incorporate a symbol table with the executable code, simple variable renaming can save a great deal of space in the resulting code footprint. This is a crucial consideration if code size must be kept to a minimum, as with code that must be sent over a network or embedded into a small device.

Disadvantages of obfuscation


Obfuscated code is extremely difficult to debug. Variable names will no longer make sense, and the structure of the code itself will likely be modified into unrecognizability. This fact generally forces developers to maintain two builds: One that can be easily debugged, and another for release. Both builds should be tested to make sure they act identically.

Defective obfuscators

Occasionally an obfuscator may be buggy, in a difficult to reproduce way. There is little one can do except find or create a newer version or fiddle with any inputs to the obfuscator until it magically works.

Conflicts with Reflection APIs

Reflection is a set of APIs in various languages that allow an object to be examined or created just by knowing its classname at run-time. Many obfuscators allow specified classes to be exempt from renaming; and it is also possible to let a class be renamed and call it by its new name. However, the former option places limits on the dynamism of code, while the latter adds a great deal of complexity and inconvenience to the system.

See also: