Models and data not loading
joerglang opened this issue · 4 comments
Describe the bug
I have a WinForms .NET Core 5 application that uses Catalyst according to the documentation. However when trying to use the code to automatically loading the data, nothing happens. The download seems to start (as it creates some directories) but never downloads the data. I can wait for an hour in the debugger, the code doesn't return.
When running the samples of the repository (with the same code), the data is downloaded as expected.
I copied the "catalyst-models" folder to my solutions and have it copied to the debug output and then loading of the FastTextLanguageDetector.FromStoreAsync(Language.Any, Version.Latest, "");
works.
However the pipeline = Pipeline.For(Language.English);
never returns.
To Reproduce
This is the code that produces the problem
public void Init()
{
Storage.Current = new OnlineRepositoryStorage(new DiskStorage("catalyst-models"));
var t = FastTextLanguageDetector.FromStoreAsync(Language.Any, Version.Latest, "");
languageDetector = t.WaitResult();
pipeline = Pipeline.For(Language.English);
initCalled = true;
}
As this code is practically the same as in the samples, I really don't see the problem.
What I have is
- A WinForms NET 5.0 application
- References a .NET 5.0 library project
- The library project has the Catalyst nuget packages installed (1.0.16767)
- The Init function above is called in the constructor of the "detector" class.
The output windows shows the following log information from Catalyst
[14:56:16 INF] [LOAD] [FastTextLanguageDetectorData-"Any"-v0] (1 B) from '..\\Models\--\FastTextLanguageDetectorData\v000000\model-FastTextLanguageDetector-v000000.bin'
[14:56:16 INF] [LOAD] [FastTextData-Version-"Any"-v-1] (1 B) from '..\\Models\--\FastTextData-Version\v-000001\model-language-detector-v-000001.bin'
[14:56:16 INF] [LOAD] [FastTextData-Version-"Any"-v-1] (1 B) from '..\\Models\--\FastTextData-Version\v-000001\model-language-detector-v-000001.bin'
"GarbageDetection.exe" (CoreCLR: clrhost): "C:\Program Files\dotnet\shared\Microsoft.NETCore.App\5.0.6\System.Runtime.CompilerServices.Unsafe.dll" geladen. Das Laden von Symbolen wurde übersprungen. Das Modul ist optimiert, und die Debugoption "Nur eigenen Code" ist aktiviert.
[14:56:17 INF] [LOAD] [FastTextData-"Any"-v0] (15.4 MB) from '..\\Models\--\FastTextData\v000000\model-language-detector-v000000.bin'
"GarbageDetection.exe" (CoreCLR: clrhost): "C:\Program Files\dotnet\shared\Microsoft.NETCore.App\5.0.6\System.Resources.Writer.dll" geladen. Das Laden von Symbolen wurde übersprungen. Das Modul ist optimiert, und die Debugoption "Nur eigenen Code" ist aktiviert.
"GarbageDetection.exe" (CoreCLR: clrhost): "C:\Program Files\dotnet\shared\Microsoft.NETCore.App\5.0.6\System.Collections.NonGeneric.dll" geladen. Das Laden von Symbolen wurde übersprungen. Das Modul ist optimiert, und die Debugoption "Nur eigenen Code" ist aktiviert.
"GarbageDetection.exe" (CoreCLR: clrhost): "C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App\5.0.6\System.Configuration.ConfigurationManager.dll" geladen. Das Laden von Symbolen wurde übersprungen. Das Modul ist optimiert, und die Debugoption "Nur eigenen Code" ist aktiviert.
[14:56:19 INF] [B] Initializing Entries
[14:56:22 INF] [E] Initializing Entries in 2.8300 seconds at 413,653 oper/s, total of 1,170,682 operations
[14:56:22 INF] [LOAD] [SentenceDetectorModel-Version-"English"-v-1] (1 B) from '..\\Models\en\SentenceDetectorModel-Version\v-000001\model-v-000001.bin'
[14:56:22 INF] [LOAD] [SentenceDetectorModel-Version-"English"-v-1] (1 B) from '..\\Models\en\SentenceDetectorModel-Version\v-000001\model-v-000001.bin'
"GarbageDetection.exe" (CoreCLR: clrhost): "C:\Program Files\dotnet\shared\Microsoft.NETCore.App\5.0.6\System.Security.Cryptography.Csp.dll" geladen. Das Laden von Symbolen wurde übersprungen. Das Modul ist optimiert, und die Debugoption "Nur eigenen Code" ist aktiviert.
Dear @joerglang, the online model repo has now been deprecated, could you try to use instead the per-language nuget packages?
You can find them all on NuGet, for example for English: https://www.nuget.org/packages/catalyst.models.english
You need to register the models first thing before using any pipeline / model by calling this somewhere in your code:
Catalyst.Models.English.Register();
Also just that you know, the FastTextLanguageDetector model is pending being published to NuGet - see #63, but you can use the CLD2 model just fine:
var cld2LanguageDetector = await LanguageDetector.FromStoreAsync(Language.Any, Version.Latest, "");
I really cannot get it to work. Could you please check if this works? I have it running in .NET 6 which might be a problem.
I have installed Catalyst
and Catalyst.Models.English
. It would be nice if there could be a Catalyst.Models.All
which depends on all available languages so it's only a single package.
ConsoleApp_20220110_1501_DetectLanguage.zip
using Catalyst;
using Catalyst.Models;
using Mosaik.Core;
using Version = Mosaik.Core.Version;
string text = "What is this language?";
Console.WriteLine("Downloading/reading language detection models..");
const string modelFolderName = "catalyst-models";
if (!new DirectoryInfo(modelFolderName).Exists)
{
Console.WriteLine("- Downloading for the first time, so this may take a little while");
}
Storage.Current = new DiskStorage(modelFolderName);
// You need to pre-register each language (and install the respective NuGet Packages)
English.Register();
LanguageDetector? cld2LanguageDetector = await LanguageDetector.FromStoreAsync(Language.Any, Version.Latest, "");
Document? document = new Document(text);
cld2LanguageDetector.Process(document);
Console.WriteLine(text);
Console.WriteLine($"Detected language: {document.Language}");
I also can't get work language detection. I'm using the code from the example (dated October 17, 2021) and I'm getting the same error "Unable to find the specified file."
The following error is displayed in the console:
fail: Mosaik.Core.ObjectStore[0]
[LOAD-ERR] LanguageDetectorModel-Any-v0 from '..\\Models\--\LanguageDetectorModel\v000000\model-v000000.binz'
System.IO.FileNotFoundException: Unable to find the specified file.
at Mosaik.Core.DiskStorage.OpenLockedStreamAsync(String path, FileAccess access)
at Mosaik.Core.ObjectStore.LoadAsync[T](IStorageTarget storeTarget, Language language, String modelType, Int32 version, String tag, Boolean compress)
Please tell me what to fix to make the language detection start working?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.