/ntih

Naming Things Is Hard

Primary LanguageShellOtherNOASSERTION

# ntih (Naming Things Is Hard)

# Usage

> ntih [OPTIONS] [[MIN:]MAX] [COUNT] < EXAMPLE TEXT > WORDS

Build a Markov chain of character pairs by sampling words from input text.
Using that chain, generate COUNT words with length between MIN and MAX characters.
Given MAX and not MIN generate words of exactly that length.

OPTIONS
 -f CHAIN_FILE  : File where the chain will be stored.
 -i INPUT_FILE  : Input file from which to read sample words.
 -r             : Force regeneration of CHAIN_FILE.

ENVIRONMENT
 $NTIHFILE : Default CHAIN_FILE from which to load a chain.
 $NTIHDICT : Default INPUT_FILE from which to generate a chain.

# Example
 $> ntih 1:6 100 | sort | column -c 80
 aemeez  bsiono  dumung  gshopa  isluab  lhedhe  natych  rxisty  unkyup  wyl
 aigibr  ceubla  eapjs   hé      jamé    logood  on      scorva  unshev  xem
 ångfuc  cimius  écigae  hodt    jaryde  lsserk  peogta  shdovo  urdogb  ythhip
 ångmyr  climoc  ectmak  hoke    jibruc  lyevac  pique   sveral  utlixs  yve
 ångtif  craesi  emliel  htnava  jociz   mb      plyplu  tcafon  vim     zacqué
 ångtod  dabune  émochp  hyanys  jutezb  mirhog  punqué  tlosua  vlopyr  zerplo
 atulkw  ddysmi  fdregy  hysfud  khmikh  mnseyb  qadley  trsvex  vutshn  zhizoi
 awma    débuzs  ferbre  ibbuys  knodki  msscsl  qommoh  twipus  vuttge  ziccav
 azurlu  dhuthp  filwon  icucli  kru     nafinb  r       twogga  whorwa  zircoo
 blia    dudotp  fucuff  irrhot  lathit  nagluk  rwopré  ulyrud  wonien  zs

# What
Here's a simple line encoding for a markov chain of characters in a word:
 <char previous><char current> <char next><integer weight> ...

Entries looks like this:
 ^l a38 e24 i23 l1 o18 u9 y2
 an $49 a11 c27 d48 e12 f1 g22 i19 k7 l1 n9 o4 q1 s54 t53 u3 w1 x1
 nd $12 a13 b5 e42 f1 h2 i21 l12 m2 o7 p1 r5 s17 t1 u4
 fr $1 a10 e15 i2 o9 u2 y1
 db a3 i1 l1 u2

Each rule gives possible next characters given the previous two characters.
There are two special characters '^' and '$' denoting start and end points.

To build a word a starting character is selected randomly, then for each
step a next character is selected randomly weighted by the rules.

Function ntih_render reads a chain of this format then uses it to generate words.
Function ntih_build generates a chain in this format by reading text input.
Function ntih wraps the build and render functions in a nice interface.

# Links
GNU - https://www.gnu.org/software/
Bash - https://tiswww.case.edu/php/chet/bash/bashtop.html
Greg's Wiki - http://mywiki.wooledge.org/