[Enhance] Make Vietnamese as an target language

Hi @jvamvas,

The tool you have implemented is very excellent. ATM, I'm doing the same things as your tool (checking plagiarism between Vietnamese and English).

So if your tool can support another target language as Vietnamese, I would be delighted to help implement this feature with you guys. If not, could you please let me know any best practices (prepare dataset, train model, ...) to add a new target language to your tools (maybe I will fork your project and do it on my side) ?

Thanks,

Hi @danielmalaton, thanks for reaching out!

I just checked and the models listed in the README seem to support the Vietnamese language.

So you can do:

from nmtscore import NMTScorer
scorer = NMTScorer()
scorer.score("Xin chào", "Hello")

I've tested some plagiarism checking on Vietnamese, and it works fine. Pretty cool @jvamvas ! 🚀

I found you use 3 NMT models to implement your tool, but I wonder if will you plan to implement your own NMT model for 1 specific language (source language - target language) e.g. English - France so that it will improve the accuracy one of your specific tool?

Also, the details of m2m100 can not be accessed? Could you please help update this link too?

Hi @danielmalaton, great to hear!
You could run NMTScore with your own NMT model by subclassing the following base class:

nmtscore/src/nmtscore/models/__init__.py

Line 10 in 2a46646

class TranslationModel:

I am going to close this issue because the initial question has been resolved.