thiborose/gecko-app

Czech version

Closed this issue · 6 comments

Try to re train the czech model to get better accuracies + implement it

Dataset: https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-3057

Results of the model for now:

train: accuracy: 0.8452, loss: 0.9536
validation: accuracy: 0.8121, loss: 1.3333

Test set:
Precision : 0.6565
Recall : 0.2397
F_0.5 : 0.4871

I'm not sure we'll get better than this. We don't have synthetic data to pre-train, as gector did. I can just try to let it train for more epochs.

If we can't achieve better results, we'd better talk to the teachers to decide if it's worth implementing it even with low accuracy.

Also, @jacqle should share with us what Czech speakers think of the results.

Also, @jacqle should share with us what Czech speakers think of the results.

Do we really need that, knowing the bad results on the test set ?

Also, @jacqle should share with us what Czech speakers think of the results.

Do we really need that, knowing the bad results on the test set ?

As far as I know, I think he already asked.

Also, @jacqle should share with us what Czech speakers think of the results.

Do we really need that, knowing the bad results on the test set ?

As far as I know, I think he already asked.

No I haven't asked yet, I'll ask if we can get better results.

The reason why we have such bad results might be that we don't have synthetic data, like @Dodo-s95 said.

We will probably not have time to create such data and train the model before the jury. Let's put this idea aside for now, I'll close this issue. Don't hesitate to reopen if needed