Trojan Source: Invisible Vulnerabilities
We present a new type of attack in which source code is maliciously encoded so that it appears different to a compiler and to the human eye. This attack exploits subtleties in text-encoding standards such as Unicode to produce source code whose tokens are logically encoded in a different order from the one in which they are displayed, leading to vulnerabilities that cannot be perceived directly by human code reviewers.
'Trojan Source' attacks, as we call them, pose an immediate threat both to first-party software and supply-chain compromise across the industry. We present working examples of Trojan-Source attacks in C, C++, C#, JavaScript, Java, Rust, Go, and Python. We propose definitive compiler-level defenses, and describe other mitigating controls that can be deployed in editors, repositories, and build pipelines while compilers are upgraded to block this attack.
Additional details can be found in our related paper (also on arXiv) and at trojansource.codes.
This repository is divided into per-language subdirectories. Each subdirectory contains a series of proofs-of-concept implementing various Trojan-Source attacks as well as a README describing the compilers/interpreters with which these attacks were verified.
Interested in analyzing source code files for the presence of Trojan Source attacks? Check out this repo, which visualizes bidirectional overrides.
If you use anything in this repository, in the Trojan Source paper, or on trojansource.codes in your own work, please cite the following:
@article{boucher_trojansource_2021,
title = {Trojan {Source}: {Invisible} {Vulnerabilities}},
author = {Nicholas Boucher and Ross Anderson},
year = {2021},
journal = {Preprint},
eprint = {2111.00169},
archivePrefix = {arXiv},
primaryClass = {cs.CR},
url = {https://arxiv.org/abs/2111.00169}
}