/trojan-source

Trojan Source: Invisible Vulnerabilities

Primary LanguageSCSSMIT LicenseMIT

Trojan Source

Trojan Source: Invisible Vulnerabilities

Overview

We present a new type of attack in which source code is maliciously encoded so that it appears different to a compiler and to the human eye. This attack exploits subtleties in text-encoding standards such as Unicode to produce source code whose tokens are logically encoded in a different order from the one in which they are displayed, leading to vulnerabilities that cannot be perceived directly by human code reviewers.

'Trojan Source' attacks, as we call them, pose an immediate threat both to first-party software and supply-chain compromise across the industry. We present working examples of Trojan-Source attacks in C, C++, C#, JavaScript, Java, Rust, Go, Python, SQL, Bash, Assembly, and Solidity. We propose definitive compiler-level defenses, and describe other mitigating controls that can be deployed in editors, repositories, and build pipelines while compilers are upgraded to block this attack.

Additional details can be found in our related paper (also on arXiv) and at trojansource.codes.

Proofs-of-Concept

This repository is divided into per-language subdirectories. Each subdirectory contains a series of proofs-of-concept implementing various Trojan-Source attacks as well as a README describing the compilers/interpreters with which these attacks were verified. The source code for the website publishing these attacks can is located in the website/ subdirectory.

Languages

We include a summary of the languages evaluated in the table below:

Language Vulnerable to
Early Return
Vulnerable to
Commenting-Out
Vulnerable to
Stretched Strings
Tool Evaluated
C ~ GNU gcc v7.5.0
Apple clang v12.0.5
C++ ~ GNU g++ v7.5.0
Apple clang++ v12.0.5
C# ~ .NET 5.0 via dotnet-script
JavaScript ~ Node.js v16.4.1
Java ~ OpenJDK v16.0.1
Rust ~ rustc v1.53.0
Go ~ go v1.16.6
Python Python 3.9.5 via clang
Python 3.7.10 via gcc
SQL SQLite v3.39.4
Bash ~ zsh v5.8.1
Assembly ~ x86_64 gas on Apple clang v14.0.0
Solidity ~ Solidity v0.8.16

✓ means the rendered code visually matches common style for that language, while ~ means visual renderings adhere to language syntax but deviate from common style (e.g. the multiline comment terminator */ is written as /*/). The proofs-of-concept included in this respository provide explicit examples for clarity.

We note that this list of affected languages is non-exhaustive, and welcome community contributions to expand to further languages.

We further note that some of the above tools have been patched since the disclosure of Trojan-Source attacks, and therefore include the versions of each tool evaluated. For example, rustc now throws errors for unterminated Bidi control characters.

Finally, in addition to the Bidi attacks shown above, we evaluated each language against the Homoglyph and Invisible character attacks also described in the related paper. These evaluations can be found in the README files of each language subdirectory.

Code Viewers

We include a summary of the code viewers evaluated in the table below:

Bidi Attack (Windows) Bidi Attack (MacOS) Bidi Attack (Ubuntu) Homoglyph Attack (Windows) Homoglyph Attack (MacOS) Homoglyph Attack (Ubuntu)
Visual Studio Code (v1.61)
Atom (v1.58.0)
SublimeText (v4121) Bidi unactioned Bidi unactioned Bidi unactioned
Notepad++ (v8.1.9) Displays control symbol N/A N/A N/A N/A
Eclipse (v4.21) Mangled Missing Glyph
IntelliJ (v2021.2.3) Displays control char Displays control char Displays control char
Visual Studio (v16.11.5/v8.10.11) Mangled N/A N/A
Xcode (v14.0.1) N/A N/A N/A N/A
vim (v8.2.1790) Mangled Displays codepoint Displays codepoint Misrendered
emacs (v27.2) Displays underscores
GitHub (patched Oct '21) ✓ (except Safari)
Bitbucket (patched Nov '21) ✓ (except Safari)
GitLab (patched Oct '21) ✓ (except Safari)

✓ means that the code viewer is vulnerable to the attack on that platform. N/A indicates that the code viewer is not available on that platform. All web-based products were tested on October 2021 releases of Google Chrome, Microsoft Edge, Mozilla Firefox, and Apple Safari. Any visualization deviations on non-vulnerable platforms are described.

We note that many of these code viewers have since been patched, and for patched versions Trojan Source defenses may need to be disabled in settings to visualize these attacks as described in the related paper.

Reproducability

To maximize reproducability, we note that all evaluations were performed on the following operating systems:

  • Windows: Window 10 build 19043
  • MacOS: MacOS Big Sur
  • Ubuntu: Ubuntu 20.04

As noted, many of the compilers, code editors, and repository frontends examined in this work has since been patched with Trojan Source defenses. To reproduce the results, we recommend installing the known-vulnerable versions of software listed above, or disabling any defenses in the settings of later versions.

To validate our results, we recommend opening each of the proofs-of-concept in a vulnerable code viewer, confirming that the code is displayed as depicted in the related paper, and validating that the program executes the hidden logic rather than the visualized logic when compiled/executed with a vulnerable compiler/interpreter. Example compiler or interpreter commands are provided in the subdirectory README for each vulnerable language included in this repository.

Docker

To ease reproducability, we provide a Dockerfile that pre-installs and compiles the POCs in this repository using vulnerable tooling. The following commands will build the image, launch a container, and attach a terminal to the container for faster reproduction of our findings:

docker build -t trojan-source .
docker run --name ts -d -it trojan-source
docker attach ts

Note that the Solidity and Assembly POCs are exluded from the Docker image because they target different platforms than the Ubuntu base image. Reproduction instructions for these two platforms are given in Solidity/README.md and Assembly/README.md.

Attack Detection

Interested in analyzing source code files for the presence of Trojan Source attacks? Check out this repo, which visualizes bidirectional overrides.

Citation

If you use anything in this repository, in the Trojan Source paper, or on trojansource.codes in your own work, please cite the following:

@inproceedings{boucher_trojansource_2023,
    author = {Nicholas Boucher and Ross Anderson},
    title = {Trojan {Source}: {Invisible} {Vulnerabilities}},
    booktitle = {32nd USENIX Security Symposium (USENIX Security 23)},
    year = {2023},
    address = {Anaheim, CA},
    publisher = {USENIX Association},
    month = aug,
    url = {https://arxiv.org/abs/2111.00169}
}