Czech version

Question

Czech version

Closed this issue 4 years ago · 6 comments

thiborose commented 4 years ago

Try to re train the czech model to get better accuracies + implement it

Dataset: https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-3057

Results of the model for now:

train: accuracy: 0.8452, loss: 0.9536
validation: accuracy: 0.8121, loss: 1.3333

Test set:
Precision : 0.6565
Recall : 0.2397
F_0.5 : 0.4871

Answer 1 · 2020-12-03T19:53:45.000Z

I'm not sure we'll get better than this. We don't have synthetic data to pre-train, as gector did. I can just try to let it train for more epochs.

If we can't achieve better results, we'd better talk to the teachers to decide if it's worth implementing it even with low accuracy.

Answer 2 · 2020-12-03T19:57:29.000Z

Also, @jacqle should share with us what Czech speakers think of the results.

Answer 3 · 2020-12-04T09:08:29.000Z

Also, @jacqle should share with us what Czech speakers think of the results.

Do we really need that, knowing the bad results on the test set ?

Answer 4 · 2020-12-04T09:16:35.000Z

Also, @jacqle should share with us what Czech speakers think of the results.

Do we really need that, knowing the bad results on the test set ?

As far as I know, I think he already asked.

Answer 5 · 2020-12-04T09:56:44.000Z

Also, @jacqle should share with us what Czech speakers think of the results.

Do we really need that, knowing the bad results on the test set ?

As far as I know, I think he already asked.

No I haven't asked yet, I'll ask if we can get better results.

Answer 6 · 2020-12-06T17:21:26.000Z

The reason why we have such bad results might be that we don't have synthetic data, like @Dodo-s95 said.

We will probably not have time to create such data and train the model before the jury. Let's put this idea aside for now, I'll close this issue. Don't hesitate to reopen if needed