Cannot process Chinese correctly
TomoakiChenSinica opened this issue · 0 comments
TomoakiChenSinica commented
Language
Which language(s) this issue relates to.
Chinese
Describe the bug
A clear and concise description of what the bug is.
I cannot process chinese sentence correctly.
To Reproduce
Steps to reproduce the behavior
- I ran a code like the code block in Screenshots.
- I got the result like:
{"Language":"zh","Length":5,"Value":"往前走五步","TokensData":[[{"Bounds":[0,4],"Tag":"PROPN"}]]}
Expected behavior
A clear and concise description of what you expected to happen.
Tokenize and tag correctly
Screenshots
If applicable, add a code example to help explain your problem.
Here is my code:
Catalyst.Models.Chinese.Register(); //You need to pre-register each language (and install the respective NuGet Packages)
Storage.Current = new DiskStorage("catalyst-models");
var nlp = await Pipeline.ForAsync(Language.Chinese);
var doc = new Document("諸葛亮是三國時代著名軍師", Language.Chinese);
nlp.ProcessSingle(doc);
Console.WriteLine(doc.ToJson());
Additional context
Thank you for your help!