LuteOrg/lute-v3

Add "term grouping exceptions" to mandarin parser

Closed this issue · 4 comments

Per this discord thread - "Overriding the highlight"

Sometimes jieba groups things incorrectly. Users need to be able to get at the underlying characters. The mandarin fork had a text file of parsing exceptions, e.g.

X,Y

which means "if you try to group "XY" together in a term, instead parse it as "X" and "Y". Note that this could still be grouped a bit, eg "X,YZ" means "if you try to group XZY all together, instead parse it as X and YZ."

Working on this currently, have a good handle on it. Lute will need to be launched for this capability, as there are changes required in the abstract parser.

Branch wip_issue_430_parser_exceptions pushed, tests with exceptions are working for mandarin parser.

Now have to call the init_data_dir for each parser and loaded plugin in the app_factory, should be straightforward.

*** MAYBE move code for init_plugins to app_factory ... seems like the right place, as the factory has to do some extra stuff for the plugins (?)
*** create the top-level `userparserdata` dir if any parser actually has a data dir, in app_factory
*** assign parser's directory for all parsers
*** if any parser needs a data dir, call top-level "create data dir" thing for all parsers
*** after parsers loaded, loop and call "set up data" method - parsers handle that - create files and dirs

Then test it out:

  • install lute only, no plugin
  • start it up -- no extra data dir
  • install mandarin plugin
  • start it up -- extra data dir
  • test it out - add some exceptions to the file, check with the demo story

In develop, seems to work fine.

Launched in 3.4.2.