Aksharas is an utility for analysing akṣaras and varṇas in a Devanagari text.
npm i @vipran/aksharas
import Aksharas from "@vipran/aksharas";
// OR for CommonJS:
// const Aksharas = require("@vipran/aksharas").default;
const input = "सर्वे भवन्तु सुखिनः।"
const results = Aksharas.analyse(input);
const aksharas = results.aksharas.map(akshara => akshara.value);
console.log(aksharas); // "स", "र्वे", "भ", "व", "न्तु", "सु", "खि", "नः"
Accepts a string
input and returns a Results
object.
const input: string = 'नमः';
const results: Results = Aksharas.analyse(input);
It is an enum with the following values:
TokenType.Akshara
TokenType.Symbol
TokenType.Whitespace
TokenType.Invalid
TokenType.Unrecognised
These can be used to filter the tokens in the Results
object. Example:
import Aksharas from "@vipran/aksharas";
// OR import Aksharas, { TokenType } ...
const input = "हे! हरेऽत्र नागच्छ।";
const results = Aksharas.analyse(input);
const symbols = results.all
.filter((token) => token.type === Aksharas.TokenType.Symbol)
.map((token) => token.value);
console.log(symbols); // "ऽ", "।"
It is an enum with the following values:
VarnaType.Svara
VarnaType.Vyanjana
These can be used to filter the varnas in Results.varnas
. Example:
import Aksharas from "@vipran/aksharas";
// OR import Aksharas, { VarnaType } ...
const input = "गुरुः";
const results = Aksharas.analyse(input);
const svaras = results.varnas
.filter((varna) => varna.type === Aksharas.VarnaType.Svara)
.map((varna) => varna.value);
console.log(svaras); // "उ", "उः"
The Results
object contains the following properties:
- all
- type:
Token[]
- An array of
Token
objects containing all the tokens analysed frominput
string. It includes Devanagari akṣaras, Devanagari symbols (१, २, ।, ॥, etc.) and non-devangari characters (i.e. characters in other scripts, special characters, whitespace characters, etc.)
- type:
- aksharas
- type:
Token[]
- Devanagari syllables like रा, सी, etc. Here, halanta consonants such as क्, च्, य्, etc. are also considered as
aksharas
when they are at the end of a word.
- type:
- varnas
- type:
Varna[]
- Devanagari consonants and vowels in the
input
. (Only in v0.4.0 or above.)
- type:
- symbols
- type:
Token[]
- Devanagari symbols such as १, २, ।, ॥, etc.
- type:
- whitespaces
- type:
Token[]
- All whitespace characters:
\s
,\t
,\n
, etc.
- type:
- invalid
- type:
Token[]
- All Devanagari characters whose occurance in the
input
string do not conform to the definition of an akṣara. For example, a virāma or a vowel mark which is not preceded by a consonant is invalid. ("अ्", "गोु", etc.)
- type:
- unrecognised
- type:
Token[]
- Non-devangari characters (i.e. characters in other scripts and special characters such as @, #, etc.)
- type:
- chars
- type:
string[]
- All Unicode characters in the
input
string. Same asString.prototype.split()
.
- type:
Many of the properties in the Results
object consists of an array of Token
-s. A Token
object has the following properties:
- type
- type:
TokenType
- Type of the token. One of the values of
Aksharas.TokenType
.
- type:
- value
- type:
string
- Conatins an analysed part of the
input
string.
- type:
- from
- type:
number
- From index - representing the start position of the token in the
input
string.
- type:
- to
- type:
number
- To index - representing the end position of the token in the
input
string.
- type:
- attributes
- type:
Record<string, any>
- An optional key-value object which may contain other attributes of the token. It is currently used only in the
Akshara
tokens for storing thevarnas
in that akshara.
- type:
Results.varnas
consists of an array of Varna
objects. A Varna
object has the following properties:
- type
- type:
VarnaType
- Type of the token. One of the values of
Aksharas.VarnaType
.
- type:
- value
- type:
string
- Conatins an analysed part of the
input
string.
- type:
MIT © Prasanna Venkatesh T S