curiosity-ai/catalyst

Spacy.Initialize() Throws Exception

lightel opened this issue · 5 comments

Describe the bug
I have a command-line .NET 5.0 application that uses Catalyst and Catalyst.Spacy libraries (both have version 1.0.23862). I was following this guide to build a minimal application to analyze text with spacy.

When I run the sample I get the following exception message:

Unhandled exception. System.Collections.Generic.KeyNotFoundException: The given key '3.2.0' was not present in the dictionary.
   at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
   at Catalyst.Spacy.LoadModelsData(ModelSize modelSize, Language[] languages)
   at Catalyst.Spacy.Initialize(ModelSize modelSize, Language[] languages)
   at catalyst_test.Program.RunSpacy() in C:\Users\andru\source\repos\catalyst_test\catalyst_test\Program.cs:line 51
   at catalyst_test.Program.Main(String[] args) in C:\Users\andru\source\repos\catalyst_test\catalyst_test\Program.cs:line 19
   at catalyst_test.Program.<Main>(String[] args)

To Reproduce
Here is below the code to reproduce the issue:

            using (await Spacy.Initialize(Spacy.ModelSize.Small, Language.Any, Language.English))
            {
                var nlp = Spacy.For(Spacy.ModelSize.Small, Language.English);
                var doc = new Document("Bill Gates it the founder of Microsoft", Language.English);
                nlp.ProcessSingle(doc);
                Console.WriteLine(doc.ToJson());
            }

It turns out the compatibility.json file which is used for downloading the spacy model doesn't have version 3.2.0 anymore. Instead, they have a version 3.2:

{
  "spacy": {
    "3.2": {
      "ca_core_news_lg": [
        "3.2.0"
       ],
...

@lightel did you resolve this particular issue? I am having the same problem.

Rafael says in his blog post:

After a bit of fiddling with how spaCy downloads and install models (and how they handle model compatibility across versions), I ended up reverse engineering the download logic and reimplementing it in C# to invoke directly the Installer.PipInstallModule method with the correct URL created for the installed spaCy version, language and model sizes requested by the user (similar to how the spaCy CLI invokes pip it in this line)

Did you resolve this by unpacking Rafael's code to redo that download logic? If so, can you share it?

Let me check here how to fix this, probably an easy fix on my side!

@lightel @Billyish just pushed a fix that should handle this case, could you test when it's published to nuget - should take an hour to be online (package version 1.0.24611)

Thanks @theolivenbaum I was about to post to say that I have worked out the issue myself. Good to have the package update though! I will check out your update.