Czech version
Closed this issue · 6 comments
Try to re train the czech model to get better accuracies + implement it
Dataset: https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-3057
Results of the model for now:
train: accuracy: 0.8452, loss: 0.9536
validation: accuracy: 0.8121, loss: 1.3333
Test set:
Precision : 0.6565
Recall : 0.2397
F_0.5 : 0.4871
I'm not sure we'll get better than this. We don't have synthetic data to pre-train, as gector did. I can just try to let it train for more epochs.
If we can't achieve better results, we'd better talk to the teachers to decide if it's worth implementing it even with low accuracy.
Also, @jacqle should share with us what Czech speakers think of the results.
Do we really need that, knowing the bad results on the test set ?
Also, @jacqle should share with us what Czech speakers think of the results.
Do we really need that, knowing the bad results on the test set ?
As far as I know, I think he already asked.
Also, @jacqle should share with us what Czech speakers think of the results.
Do we really need that, knowing the bad results on the test set ?
As far as I know, I think he already asked.
No I haven't asked yet, I'll ask if we can get better results.
The reason why we have such bad results might be that we don't have synthetic data, like @Dodo-s95 said.
We will probably not have time to create such data and train the model before the jury. Let's put this idea aside for now, I'll close this issue. Don't hesitate to reopen if needed