curiosity-ai/catalyst

Corpus?

dgerding opened this issue · 3 comments

Can you point to the Universal Dependencies data you used? Or include it, guessing, in the Corpus project? Really excited to be able to try training.

Thanks
Dave G

Hi Dave,

The training data used for the Catalyst.Training project can be found bellow:

You can also use the pre-trained models available in the online repository, for example:

//Configures the model storage to use the online repository backed by the local folder ./catalyst-models/
Storage.Current = new OnlineRepositoryStorage(new DiskStorage("catalyst-models"));
var nlp = await Pipeline.ForAsync(Language.English);
nlp.Add(await AveragePerceptronEntityRecognizer.FromStoreAsync(language: Language.English, version: Version.Latest, tag: "WikiNER"));

If you want, I can also provide you a direct download link for all the data - it's about 3.4GB without the OntoNotes dataset.

Thanks!

Hi! I know this issue is long closed, but I would be grateful if that download link was published :^)