/franc

Natural language detection

Primary LanguageJavaScriptMIT LicenseMIT

franc

Build Status Coverage Status

Detect the language of text.

What’s so cool about franc?

  1. franc can support more languages(†) than any other library
  2. franc is packaged with support for 82, 188, or 402 languages
  3. franc has a CLI

† - Based on the UDHR, the most translated document in the world.

What’s not so cool about franc?

franc supports many languages, so make sure to pass it big documents, to get reliable results.

Installation

npm:

npm install franc

This installs the franc package, with support for 188 languages (languages which have 1 million or more speakers). franc-min (82 languages, 8m or more speakers) and franc-all (all 402 possible languages) are also available. Finally, use franc-cli to install the CLI.

Browser builds for franc-min, franc, and franc-all are available on GitHub Releases.

Usage

var franc = require('franc')

franc('Alle menslike wesens word vry') // => 'afr'
franc('এটি একটি ভাষা একক IBM স্ক্রিপ্ট') // => 'ben'
franc('Alle menneske er fødde til fridom') // => 'nno'
franc('') // => 'und'
franc('the') // => 'und'

/* You can change what’s too short (default: 10): */
franc('the', {minLength: 3}) // => 'sco'
.all
console.log(franc.all('O Brasil caiu 26 posições'))

Yields:

[ [ 'por', 1 ],
  [ 'src', 0.8797557538750587 ],
  [ 'glg', 0.8708313762329732 ],
  [ 'snn', 0.8633161108501644 ],
  [ 'bos', 0.8172851103804604 ],
  ... 116 more items ]
whitelist
console.log(franc.all('O Brasil caiu 26 posições', {whitelist: ['por', 'spa']}))

Yields:

[ [ 'por', 1 ], [ 'spa', 0.799906059182715 ] ]
blacklist
console.log(franc.all('O Brasil caiu 26 posições', {blacklist: ['src', 'glg']}))

Yields:

[ [ 'por', 1 ],
  [ 'snn', 0.8633161108501644 ],
  [ 'bos', 0.8172851103804604 ],
  [ 'hrv', 0.8107092531705026 ],
  [ 'lav', 0.810239549084077 ],
  ... 114 more items ]

CLI

Install:

npm install franc-cli --global

Use:

CLI to detect the language of text

Usage: franc [options] <string>

Options:

  -h, --help                    output usage information
  -v, --version                 output version number
  -m, --min-length <number>     minimum length to accept
  -w, --whitelist <string>      allow languages
  -b, --blacklist <string>      disallow languages
  -a, --all                     display all guesses

Usage:

# output language
$ franc "Alle menslike wesens word vry"
# afr

# output language from stdin (expects utf8)
$ echo "এটি একটি ভাষা একক IBM স্ক্রিপ্ট" | franc
# ben

# blacklist certain languages
$ franc --blacklist por,glg "O Brasil caiu 26 posições"
# src

# output language from stdin with whitelist
$ echo "Alle mennesker er født frie og" | franc --whitelist nob,dan
# nob

Supported languages

Package Languages Speakers
franc-min 82 8M or more
franc 188 1M or more
franc-all 402 -

Language code

Note that franc returns ISO 639-3 codes (three letter codes). Not ISO 639-1 or ISO 639-2. See also GH-10 and GH-30.

Ports

Franc has been ported to several other programming languages.

The works franc is derived from have themselves also been ported to other languages.

Derivation

Franc is a derivative work from guess-language (Python, LGPL), guesslanguage (C++, LGPL), and Language::Guess (Perl, GPL). Their creators granted me the rights to distribute franc under the MIT license: respectively, Kent S. Johnson, Jacob R. Rideout, and Maciej Ceglowski.

License

MIT © Titus Wormer