Obfuscation Dataset

This is a dataset of pairs of obfuscated:clean code samples. The repository contains the samples used in the experiments for DeepObfusCode: Source Code Obfuscation Through Sequence-to-Sequence Networks.

To cite this dataset:

S. Datta. 2020. DeepObfusCode: Source Code Obfuscation Through Sequence-to-Sequence Networks. In Advances in Intelligent Systems and Computing

Sources for the samples: [1] Retrieved from https://stackoverflow.com/questions/15670185/how-to-de-obfuscate-the-ctk-c-code-the-winner-of-2001s-ioccc

[2] Retrieved from https://medium.com/@LainIwakura/deobfuscating-code-for-fun-and-no-profit-round-2-60d78b67ebce http://www.ioccc.org/1995/heathbar.c

[3] Retrieved from https://medium.com/@LainIwakura/deobfuscating-obfuscated-code-for-fun-and-no-profit-36ec615c8f5d & http://www.ioccc.org/1984/anonymous/anonymous.c

[4] Retrieved from https://github.com/litonico/DeobfuscateEndoh & https://github.com/litonico/DeobfuscateEndoh/blob/master/endoh1.c & https://github.com/litonico/DeobfuscateEndoh/blob/master/endoh1_deobfuscate.c

[5] Retrieved from https://linuxgamecast.com/forums/topic/deobfuscate-the-carl-banks-flight-simulator-from-ioccc-1998-readable-clear-up-indented/}