hangyav/textLSP

Support for (natural) languages other than english

Closed this issue · 2 comments

I'm not sure which languages are supported by this extension. I have a markdown file in German, Using the config below, Most German words are marked as spelling mistakes, the proposes corrections are mostly nonsensical:

"das" -> "dms"
"aber" -> "Uber"

Are just a few examples. It would be useful if we could specify somehow what language we want to use, but I assume this comes down to which languages are supported by the backend.

    textLSP = {
      analysers = {
        languagetool = {
          enabled = true,
          check_text = {
            on_open = true,
            on_save = true,
            on_change = false,
          },
        },
        gramformer = {
          -- gramformer dependency needs to be installed manually
          enabled = true,
          gpu = false,
          check_text = {
            on_open = false,
            on_save = true,
            on_change = false,
          },
        },
        hf_checker = {
          enabled = false,
          gpu = false,
          quantize = 32,
          model = "pszemraj/flan-t5-large-grammar-synthesis",
          min_length = 40,
          check_text = {
            on_open = false,
            on_save = true,
            on_change = false,
          },
        },
        hf_instruction_checker = {
          enabled = true,
          gpu = false,
          quantize = 32,
          model = "grammarly/coedit-large",
          min_length = 40,
          check_text = {
            on_open = false,
            on_save = true,
            on_change = false,
          },
        },
        hf_completion = {
          enabled = true,
          gpu = false,
          quantize = 32,
          model = "bert-base-multilingual-cased",
          topk = 5,
        },
        -- openai = {
        --   enabled = false,
        --   api_key = "<MY_API_KEY>",
        --   check_text = {
        --     on_open = false,
        --     on_save = false,
        --     on_change = false,
        --   },
        --   model = "gpt-3.5-turbo",
        --   max_token = 16,
        -- },
        -- grammarbot = {
        --   enabled = false,
        --   api_key = "<MY_API_KEY>",
        --   -- longer texts are split, this parameter sets the maximum number of splits per analysis
        --   input_max_requests = 1,
        --   check_text = {
        --     on_open = false,
        --     on_save = false,
        --     on_change = false,
        --   },
        -- },
      },
      documents = {
        -- org = {
        --   org_todo_keywords = {
        --     "TODO",
        --     "IN_PROGRESS",
        --     "DONE",
        --   },
        -- },
        txt = {
          parse = true,
        },
      },
    },

Hey,

The project currently is quite English focused and needs some work to add better support. However, the supported language mainly depends on the analyser you use.

  • grammarly/coedit-large (hf_instruction_checker): only English
  • bert-base-multilingual-cased (hf_completion): over a 100 languages
  • openai: should be multilingual as well
  • gramformer: English
  • languagetool supports a large number of options including German, however it needs to be configured manually. I pushed a quick commit to the multilang branch for you to try it. You just need to set the language in the config. Let me know how it works out.
    textLSP = {
    .
    .
    .
      documents = {
        language = "de",
        txt = {
          parse = true,
        },
      },
    },

Automatic language detection was added in the latest release. Please see the example config in the readme.