Ars-Linguistica/mlconjug3

Custom handling of defective verbs

Closed this issue ยท 6 comments

Is your feature request related to a problem? Please describe.
I see you've stumbled across and fixed issues related to defective verbs in the past (such as #52) and are investigating how to handle these verbs. Now I wonder if this library offers a simple way to ignore these defective verb special cases, and just show their conjugations as if they were not defective.

This might sound like an odd request, but let me explain why anyone would need this. Defective verbs are not permanently defective. A verb that was once considered defective might a few decades later be considered normal simply because native speakers decided to conjugate the once defective forms and use them.

I'll use as examples the verbs computar, gerir, or even banir in Portuguese. The first two verbs were once not conjugated in the first person of the indicative present, yet nowadays they are used normally, especially computar (and mlconjug3 recognizes this). The last example, banir, is still considered by many conjugation sources as defective in the first person indicative present, but I can assure you that eu bano is used all the time by Portuguese online communities, and thus, banir might no longer be considered defective in the near future.

I believe this library doesn't want to be dated, which admittedly is a tough task when it comes to keeping up with living languages, however I think there could be a simple way to deal with this issue (if there isn't one already).

Describe the solution you'd like
There could be an optional parameter that tells the conjugator API to ignore defective verbs and fill the previously unknown defective conjugations with its best guess, just as it already does with verbs that do not exist.

Describe alternatives you've considered
In the case of banir, for example, if someone wanted to have all of its possible conjugations, they would surprised that mlconjug3 doesn't think this verb has a 1st, 2nd nor 3rd person singular conjugations. However, I could swap the first consonant of the verb and ask mlconjug3 to conjugate danir (which doesn't exist in Portuguese) instead, and return its conjugations after manually replacing the Ds with Bs.

Additional context
Some test case examples (using the CLI for brevity):

$ mlconjug3 banir -l pt

Result snippet:
image

$ mlconjug3 danir -l pt

Result snippet:
image

I hope this made sense. If anybody needs more details, feel free to ask me anything.

Hi @vekat !

Sorry for not replying earlier I was quite busy lately.

Thank you for using mlconjug3 and submitting this feature request.
I think it is a great suggestion and I had been pondering about the same issues for a while regarding how to handle defective verbs.

I will implement an optional feature to fill the gaps for defective verbs as you suggest, it is the most elegant way to deal with this issue.

I will start implementing this during the week and I will definitely keep you informed and maybe ask for your feedback on the implementation before releasing a new version.

Thanks again for your helpful feedback.

Cheers,

SekouDiaoNlp.

Hi @vekat , how are you?

I am currently investigating how to best implement the custom handling of defective verbs.

Your idea of substituting the first letter/syllable of the infinitive is a good one, but after consulting with a phonologist colleague, it appears that it is not as simple as I might have assumed at first.

The substituted syllable must have similar phonological characteristics to the original one.

I am in the process of implementing a model of Optimality Theory into mlconjug3 to make the substitution robust in all supported languages.

It is going to take a bit of time to implement, but after that mlconjug3 will be able to conjugate defective verbs in all persons and tenses in a consistent manner.

I will let you know as soon as I have a working prototype.

Cheers,

SekouDiaoNlp.

Hello @SekouDiaoNlp, I don't know exactly what optimality theory means in this context, but the idea of substituting letters that I wrote under "Describe alternatives you've considered" was not really how I expected this issue to be solved from your perspective as the library author. It was just me brainstorming how I could solve my specific issue by circumventing the API, so, from the point of view of a user.

What I really meant was that your library already knows how to conjugate verbs that don't exist, and this knowledge could be used (conditionally) to fill in the holes purposely left by defective verbs. If I understood the project correctly, the reason why defective verbs are missing conjugations is not because the neural net has learned what should be defective and what shouldn't, but because they are already modelled as defective (in the mlconjug3/data/conjug_manager/ data), correct?

Hi @vekat ,

It is correct that the defective verbs are already modeled as defective in the training data.

I want to keep it that way by default and add an optional parameter to fill-in defective verbs by the machine learning model.

I have a prototype ready that should be released by the end of the month/beginning of next month.

To make this fill-in feature robust and applicable in all languages I am currently implementing Optimality Theory because depending on the substitution done to the defective verb, the model will choose a different conjugation paradigm.

I will let you know as soon as the feature is implemented.

Have a nice day!

SekouDiaoNlp.

Hi @vccortez .

Thank you again for submitting this issue.

I think I found a scheme that will work pretty robustly across different languages.

I am going to implement first some new languages before the summer ( at least Catalan, Valencian, Czech and Polish).

I will then implement your suggestion for defective verbs.

Cheers,

@SekouDiaoNlp

Hi @vccortez, @sfc32 .

Sorry for the late update, but I unfortunately contracted covid and I had to stay put for a while.

However I am now back to full form and will soon update the project with the fill-in for defective verbs.

Cheers,

SekouDiaoNlp.