robbestad/Rantjs

Strange output from default pattern

Closed this issue · 4 comments

I tried the default pattern on your page and got this:

screenshot

Two anomalies appear in the output: appearance and undefined. I am assuming that these are unintentional, so I thought I'd let you know.

Thanks! I'm in the process of rewriting the parser and adding tests for undefined so I'll definitely catch this!

Regarding 'appearance'. I checked my dictionary and it appears twice in there. I checked the original and the culprit is this: '| class appearance', which I've somehow managed to slip in as actual words. I'll get on this ASAP.

On another note, I noticed that the dictionaries are hardcoded into your script. I'm thinking that this is not the ideal way to implement them. Wouldn't it be better to load them dynamically, the way the original Rant does? This would be useful if, for example, the user wants to load their own dictionaries, or switch dictionaries on an existing Rant context (different language packs, etc.).

If it helps you, I'm experimenting with porting Rant's lexer technology (Stringes) to JavaScript. It is used to generate tokens from both patterns and DIC files. If it turns out well, it may prove useful for your project.

They're actually not hardcoded, thought it may look that way since I'm concatenating them. To make this clearer, I've updated the project to show the separation (live now at https://rantjs.herokuapp.com/). The idea was making it feasible to simply load a different dictionary on the fly by simply substituting the js file.

The thing I'm working on now is replacing the functions I built that manually substitutes and <noun .plural> etc with an automatic function. I'm pretty close to get it working, I think. The intention is that you can include any new group of keywords by adding them to the dic object and have it work automatically.

I'm definitely curious to see a port of the lexer from Rant to JavaScript. I've focused on creating native objects and lists rather than lexing the original Dic files. It's a design choice that I feel is paying off now, but I'm open to changing it.

All right, With version 0.8.3 all tokens, including all modifiers and subs, should work flawlessly. The core codebase (excluding dictionaries) has been reduced by 80%, which I'm very happy with.

Adding stuff to the dictionary is done by setting properties in the variable dic. For instance, if I'd want an _alien_tag I'd do as follows (this can be pasted in via the Console in FireFox/Chrome):

dic.alien = {};
var alien_race  = ["Badoon/Badoons","Brood/The Broods","Celestials/The Celestials","Kree/The Kree"];
dic.alien.all = alien_race;
dic.alien.race = alien_race;
dic.alien.subs=["singular","plural"];
dic.tokens.push("alien");

and then I could access it by typing:

<alien race plural>

or even

<alien plural> are like <adj pp> <noun dog plural>.