
A library that removes common unicode confusables/homoglyphs from strings.

Primary LanguageRustMIT LicenseMIT

decancer npm crates.io npm downloads crates.io downloads codacy ko-fi

A library that removes common unicode confusables/homoglyphs from strings.

  • Its core is written in Rust and utilizes a form of Binary Search to ensure speed!
  • By default, it's capable of filtering 221,529 (19.88%) different unicode codepoints like:
  • Unlike other packages, this package is unicode bidi-aware where it also interprets right-to-left characters in the same way as it were to be rendered by an application!
  • Its behavior is also highly customizable to your liking!
  • And it's available in the following languages:


Rust (v1.65 or later)

In your Cargo.toml:

decancer = "3.2.2"
JavaScript (Node.js)

In your shell:

npm install decancer

In your code (CommonJS):

const decancer = require('decancer')

In your code (ESM):

import decancer from 'decancer'
JavaScript (Browser)

In your code:

<script type="module">
  import init from 'https://cdn.jsdelivr.net/gh/null8626/decancer@v3.2.2/bindings/wasm/bin/decancer.min.js'

  const decancer = await init()

As a dependency

In your build.gradle:

repositories {
  maven { url 'https://jitpack.io' }

dependencies {
  implementation 'io.github.null8626:decancer:3.2.2'

In your pom.xml:



Building from source


git clone https://github.com/null8626/decancer.git --depth 1
cd .\decancer\bindings\java
powershell -NoLogo -NoProfile -NonInteractive -Command "Expand-Archive -Path .\bin\bindings.zip -DestinationPath .\bin -Force"
gradle build --warning-mode all


git clone https://github.com/null8626/decancer.git --depth 1
cd ./decancer/bindings/java
unzip ./bin/bindings.zip -d ./bin
chmod +x ./gradlew
./gradlew build --warning-mode all

Tip: You can shrink the size of the resulting jar file by removing binaries in the bin directory for the platforms you don't want to support.



Building from source

Building from source requires Rust v1.65 or later.

git clone https://github.com/null8626/decancer.git --depth 1
cd decancer/bindings/native
cargo build --release

And the binary files should be generated in the target/release directory.



For more information, please read the documentation.

let mut cured = decancer::cure!(r"vοΌ₯ⓑ𝔂 π”½π•ŒΕ‡β„•ο½™ ţ乇𝕏𝓣 wWiIiIIttHh l133t5p3/-\|<").unwrap();

assert_eq!(cured, "very funny text with leetspeak");

// WARNING: it's NOT recommended to coerce this output to a Rust string
//          and process it manually from there, as decancer has its own
//          custom comparison measures, including leetspeak matching!
assert_ne!(cured.as_str(), "very funny text with leetspeak");


cured.censor("funny", '*');
assert_eq!(cured, "very ***** text with leetspeak");

cured.censor_multiple(["very", "text"], '-');
assert_eq!(cured, "---- ***** ---- with leetspeak");
JavaScript (Node.js)
const assert = require('assert')
const cured = decancer('vοΌ₯ⓑ𝔂 π”½π•ŒΕ‡β„•ο½™ ţ乇𝕏𝓣 wWiIiIIttHh l133t5p3/-\\|<')

assert(cured.equals('very funny text with leetspeak'))

// WARNING: it's NOT recommended to coerce this output to a JavaScript string
//          and process it manually from there, as decancer has its own
//          custom comparison measures, including leetspeak matching!
assert(cured.toString() !== 'very funny text with leetspeak')
// => very funny text wwiiiiitthh l133t5p3/-\|<


cured.censor('funny', '*')
// => very ***** text wwiiiiitthh l133t5p3/-\|<

cured.censorMultiple(['very', 'text'], '-')
// => ---- ***** ---- wwiiiiitthh l133t5p3/-\|<
JavaScript (Browser)
<!DOCTYPE html>
<html lang="en">
    <meta charset="utf-8" />
    <title>Decancerer!!! (tm)</title>
      textarea {
        font-size: 30px;
      #cure {
        font-size: 20px;
        padding: 5px 30px;
    <h3>Input cancerous text here:</h3>
    <textarea rows="10" cols="30"></textarea>
    <br />
    <button id="cure" onclick="cure()">cure!</button>
    <script type="module">
      import init from 'https://cdn.jsdelivr.net/gh/null8626/decancer@v3.2.2/bindings/wasm/bin/decancer.min.js'
      const decancer = await init()
      window.cure = function () {
        const textarea = document.querySelector('textarea')
        if (!textarea.value.length) {
          return alert("There's no text!!!")
        textarea.value = decancer(textarea.value).toString()

See this in action here.


For more information, please read the documentation.

import io.github.null8626.decancer.CuredString;

public class Program {
  public static void main(String[] args) {
    CuredString cured = new CuredString("vοΌ₯ⓑ𝔂 π”½π•ŒΕ‡β„•ο½™ ţ乇𝕏𝓣 wWiIiIIttHh l133t5p3/-\\|<");
    assert cured.equals("very funny text with leetspeak");
    // WARNING: it's NOT recommended to coerce this output to a Java String
    //          and process it manually from there, as decancer has its own
    //          custom comparison measures, including leetspeak matching!
    assert !cured.toString().equals("very funny text with leetspeak");
    // => very funny text wwiiiiitthh l133t5p3/-\|<
    assert cured.contains("funny");
    cured.censor("funny", '*');
    // => very ***** text wwiiiiitthh l133t5p3/-\|<
    String[] keywords = { "very", "text" };
    cured.censorMultiple(keywords, '-');
    // => ---- ***** ---- wwiiiiitthh l133t5p3/-\|<

For more information, please read the documentation.

UTF-8 example:

#include <decancer.h>

#include <string.h>
#include <stdlib.h>
#include <stdio.h>

#define decancer_assert(expr, notes)                           \
  if (!(expr)) {                                               \
    fprintf(stderr, "assertion failure at " notes "\n");       \
    ret = 1;                                                   \
    goto END;                                                  \

int main(void) {
  int ret = 0;

  // UTF-8 bytes for "vοΌ₯ⓑ𝔂 π”½π•ŒΕ‡β„•ο½™ ţ乇𝕏𝓣"
  uint8_t input[] = {0x76, 0xef, 0xbc, 0xa5, 0xe2, 0x93, 0xa1, 0xf0, 0x9d, 0x94, 0x82, 0x20, 0xf0, 0x9d,
                     0x94, 0xbd, 0xf0, 0x9d, 0x95, 0x8c, 0xc5, 0x87, 0xe2, 0x84, 0x95, 0xef, 0xbd, 0x99,
                     0x20, 0xc5, 0xa3, 0xe4, 0xb9, 0x87, 0xf0, 0x9d, 0x95, 0x8f, 0xf0, 0x9d, 0x93, 0xa3};

  decancer_error_t error;
  decancer_cured_t cured = decancer_cure(input, sizeof(input), DECANCER_OPTION_DEFAULT, &error);

  if (cured == NULL) {
    fprintf(stderr, "curing error: %.*s\n", (int)error.message_length, error.message);
    return 1;

  decancer_assert(decancer_contains(cured, "funny", 5), "decancer_contains");

  return ret;

UTF-16 example:

#include <decancer.h>

#include <string.h>
#include <stdlib.h>
#include <stdio.h>

#define decancer_assert(expr, notes)                           \
  if (!(expr)) {                                               \
    fprintf(stderr, "assertion failure at " notes "\n");       \
    ret = 1;                                                   \
    goto END;                                                  \

int main(void) {
  int ret = 0;

  // UTF-16 bytes for "vοΌ₯ⓑ𝔂 π”½π•ŒΕ‡β„•ο½™ ţ乇𝕏𝓣"
  uint16_t input[] = {
    0x0076, 0xff25, 0x24e1,
    0xd835, 0xdd02, 0x0020,
    0xd835, 0xdd3d, 0xd835,
    0xdd4c, 0x0147, 0x2115,
    0xff59, 0x0020, 0x0163,
    0x4e47, 0xd835, 0xdd4f,
    0xd835, 0xdce3

  // UTF-16 bytes for "funny"
  uint16_t funny[] = { 0x66, 0x75, 0x6e, 0x6e, 0x79 };

  decancer_error_t error;
  decancer_cured_t cured = decancer_cure_utf16(input, sizeof(input) / sizeof(uint16_t), DECANCER_OPTION_DEFAULT, &error);

  if (cured == NULL) {
    fprintf(stderr, "curing error: %.*s\n", (int)error.message_length, error.message);
    return 1;

  decancer_assert(decancer_contains_utf16(cured, funny, sizeof(funny) / sizeof(uint16_t)), "decancer_contains_utf16");

  return ret;


If you want to support my eyes for manually looking at thousands of unicode characters, consider donating! ❀



Please read CONTRIBUTING.md for newbie contributors who want to contribute!