Support for (natural) languages other than english
Closed this issue · 2 comments
PlexSheep commented
I'm not sure which languages are supported by this extension. I have a markdown file in German, Using the config below, Most German words are marked as spelling mistakes, the proposes corrections are mostly nonsensical:
"das" -> "dms"
"aber" -> "Uber"
Are just a few examples. It would be useful if we could specify somehow what language we want to use, but I assume this comes down to which languages are supported by the backend.
textLSP = {
analysers = {
languagetool = {
enabled = true,
check_text = {
on_open = true,
on_save = true,
on_change = false,
},
},
gramformer = {
-- gramformer dependency needs to be installed manually
enabled = true,
gpu = false,
check_text = {
on_open = false,
on_save = true,
on_change = false,
},
},
hf_checker = {
enabled = false,
gpu = false,
quantize = 32,
model = "pszemraj/flan-t5-large-grammar-synthesis",
min_length = 40,
check_text = {
on_open = false,
on_save = true,
on_change = false,
},
},
hf_instruction_checker = {
enabled = true,
gpu = false,
quantize = 32,
model = "grammarly/coedit-large",
min_length = 40,
check_text = {
on_open = false,
on_save = true,
on_change = false,
},
},
hf_completion = {
enabled = true,
gpu = false,
quantize = 32,
model = "bert-base-multilingual-cased",
topk = 5,
},
-- openai = {
-- enabled = false,
-- api_key = "<MY_API_KEY>",
-- check_text = {
-- on_open = false,
-- on_save = false,
-- on_change = false,
-- },
-- model = "gpt-3.5-turbo",
-- max_token = 16,
-- },
-- grammarbot = {
-- enabled = false,
-- api_key = "<MY_API_KEY>",
-- -- longer texts are split, this parameter sets the maximum number of splits per analysis
-- input_max_requests = 1,
-- check_text = {
-- on_open = false,
-- on_save = false,
-- on_change = false,
-- },
-- },
},
documents = {
-- org = {
-- org_todo_keywords = {
-- "TODO",
-- "IN_PROGRESS",
-- "DONE",
-- },
-- },
txt = {
parse = true,
},
},
},
hangyav commented
Hey,
The project currently is quite English focused and needs some work to add better support. However, the supported language mainly depends on the analyser you use.
- grammarly/coedit-large (hf_instruction_checker): only English
- bert-base-multilingual-cased (hf_completion): over a 100 languages
- openai: should be multilingual as well
- gramformer: English
- languagetool supports a large number of options including German, however it needs to be configured manually. I pushed a quick commit to the multilang branch for you to try it. You just need to set the language in the config. Let me know how it works out.
textLSP = {
.
.
.
documents = {
language = "de",
txt = {
parse = true,
},
},
},
hangyav commented
Automatic language detection was added in the latest release. Please see the example config in the readme.