curiosity-ai/catalyst

Cannot process Chinese correctly

TomoakiChenSinica opened this issue · 0 comments

Language
Which language(s) this issue relates to.
Chinese

Describe the bug
A clear and concise description of what the bug is.
I cannot process chinese sentence correctly.

To Reproduce
Steps to reproduce the behavior

  1. I ran a code like the code block in Screenshots.
  2. I got the result like:
{"Language":"zh","Length":5,"Value":"往前走五步","TokensData":[[{"Bounds":[0,4],"Tag":"PROPN"}]]}

Expected behavior
A clear and concise description of what you expected to happen.
Tokenize and tag correctly

Screenshots
If applicable, add a code example to help explain your problem.

Here is my code:

Catalyst.Models.Chinese.Register(); //You need to pre-register each language (and install the respective NuGet Packages)

Storage.Current = new DiskStorage("catalyst-models");
var nlp = await Pipeline.ForAsync(Language.Chinese);
var doc = new Document("諸葛亮是三國時代著名軍師", Language.Chinese);
nlp.ProcessSingle(doc);
Console.WriteLine(doc.ToJson());   

Additional context
Thank you for your help!