Adding new auxiliary verb by specify_auxiliary.pl
s10018 opened this issue · 16 comments
I want to add word to auxiliary verb list for Japanese UDs for Modern and Spoken Japanese. (#71 )
I checked below site, but i cannot find how to add new auxiliary verb.
http://quest.ms.mff.cuni.cz/udvalidator/cgi-bin/unidep/langspec/specify_auxiliary.pl
Do you have any plans on adding new auxiliary verb the site in the future? @masayu-a
The system first checks whether the auxiliaries that had been previously hard-wired in the source code have already been documented. If it finds one or more auxiliaries for which the case has not been made, it asks the user to document them first. If there is no such backlog, the system offers the option of adding a new auxiliary (as can be observed e.g. for Korean).
This system of maintaining language-specific rules is quite new and it is possible that not everything works as intended, so let me know if there are any issues. In particular, the list of possible functions does not include all possible tenses, aspects, voices and moods at present.
We want to add Japanese auxiliary verbs for Long Unit Word definition.
The list includes compound auxiliary words as syntactic words in Japanese.
Hopefully, we want to add contracted auxiliary words in Japanese.
ている
てある
のだ
てくる
ではない
のです
てる
てしまう
てくれる
ていく
てもらう
でもある
てみる
かもしれない
のである
てある
てほしい
じゃない
ていただく
つつある
にすぎない
I have removed from the system the 120 Japanese auxiliaries that were still lacking documentation and thus blocking the addition of any new auxiliaries. You can now add the LUW auxiliaries.
On the downside, Japanese GSD and PUD are now invalid because they contain the undocumented auxiliaries (@kanayamah).
@dan-zeman
Why did you remove them from only UD Japanese?
We need more than 500 auxiliaries.
Furthermore, the list of types should be changed for Japanese.
Most auxiliaries should be as "Other"
Could you resolve the 120 Japanese auxiliaries and the additional 21 auxiliaries?
Copula
Perfect
Past
Future
Passive
Conditional
Necessitative
Potential
Desiderative
Other
Undocumented
Why did you remove them?
Because they were never correctly added and thus they were the cause why no new auxiliaries could be added to the system. They can be re-introduced through the form now. But I don't know what their function is, not to speak about examples (this kind of information was not there).
More than 500 auxiliaries sounds very strange (even more than 100 does), given the number of auxiliaries used in other languages. It raises questions of whether all these auxiliaries are really auxiliaries in the UD sense, i.e., are responsible for grammatical features such as tense, aspect, mood and voice. In fact, one of the purposes of the auxiliary registration/validation system is to ensure that the term "auxiliary" is interpreted in line with the guidelines, and similarly across languages.
Note that the "Other" category in the all-language aux table is just a formatting choice that lumps together several actual functions. However, those functions have names, and there is no "Other" function when an auxiliary is being documented. (That said, some tenses / aspects / moods / voices may still be missing from the menu in the form and can be added if needed. But there will never be an "other" option.)
Yes.
Could you add these items in the validation tool?
I have added Mood=Ben
. What feature value should I add for the honorific auxiliary?
Therefore, we chose "-----".
Actually, it was a bug in the script that allowed you to get away with "-----". I am very sorry for the inconvenience it caused.
We need Polite features for auxiliary as https://universaldependencies.org/u/feat/Polite.html
But, the Polite features can appear with the other auxiliary functions.
We also need the auxiliary of negation for Uralic languages.
We need Polite features for auxiliary as https://universaldependencies.org/u/feat/Polite.html But, the Polite features can appear with the other auxiliary functions.
My question was more about what value of the Polite
feature would corespond to your auxiliary. In the end I went with Polite=Form
. If the other auxiliaries use the Polite
morphological feature, that is OK. For example, if you have a formal and an informal version of a past tense auxiliary, it is enough to register the lemma with the past tense function, but then the actual word forms in the corpus can (and should) still have both features, i.e., Polite=Infm|Tense=Past
resp. Polite=Form|Tense=Past
. (Perhaps they should be even treated as forms of one lemma.) I understand your request for adding this function to the auxiliary specification form so that you have an auxiliary that expresses only politeness, without also expressing tense, aspect, or modality, correct?
We also need the auxiliary of negation for Uralic languages.
Do not worry about that, the negative auxiliary is already available and it is used in some Uralic languages in UD (Erzya, Moksha, Komi, Sami).
Thank you very much.
We need Polite=Infm
, Polite=Form
, Polite=Elev
and Polite=Humb
.
"行き[ません]" Polite=Form|Negation
, "なさる" Polite=Elev
, and "いたす" Polite=Humb
are presented in https://universaldependencies.org/u/feat/Polite.html
The example "行か[ない]" on the page is just Negation
.
But, we have some auxiliaries of Polite=Infm
such as "やがる".
Therefore, we need the four Polite features for the auxiliary validation.
Note that, "なさいます" on the page is Polite=Form|Polite=Elev
, and "いたします" on the page is Polite=Form|Polite=Humb
.
We cannot choose Negation
in http://quest.ms.mff.cuni.cz/udvalidator/cgi-bin/unidep/langspec/specify_auxiliary.pl
Could you add Negation
in the tool?
The example "行か[ない]" on the page is just
Negation
.
Sorry, I don't understand. I thought that both 行かない and 行きません are negation, that is, Polarity=Neg
, but the latter is polite/formal negation, i.e., Polite=Form
. Since the former is not formal, it is informal, i.e., Polite=Infm
.
Note that, "なさいます" on the page is
Polite=Form|Polite=Elev
, and "いたします" on the page isPolite=Form|Polite=Humb
.
This follows automatically from the definition on that page, which says that Elev
and Humb
are subtypes of the formal register (Form
). But at most one of the values is put in the morphological features, so if you know that it is e.g. Polite=Elev
, you no longer use Polite=Form
.
I have now added a politeness-informal function to the auxiliary specification form.
We cannot choose
Negation
in http://quest.ms.mff.cuni.cz/udvalidator/cgi-bin/unidep/langspec/specify_auxiliary.pl
Could you addNegation
in the tool?
It is there since the beginning, it is the seventeenth function: "Needed in negative clauses (like English “do”, not like “not”)". Note that if it is more like English not (as opposed to do in negative clauses), then it is not an AUX
but a PART
(see here), and its relation to the predicate of the clause is advmod
.
The example "行か[ない]" on the page is just
Negation
.
Sorry, I don't understand. I thought that both 行かない and 行きません are negation, that is,
Polarity=Neg
, but the latter is polite/formal negation, i.e.,Polite=Form
. Since the former is not formal, it is informal, i.e.,Polite=Infm
.
"行か[ない]" should be Polarity=Neutral.
Note that, "なさいます" on the page is
Polite=Form|Polite=Elev
, and "いたします" on the page isPolite=Form|Polite=Humb
.
This follows automatically from the definition on that page, which says that
Elev
andHumb
are subtypes of the formal register (Form
). But at most one of the values is put in the morphological features, so if you know that it is e.g.Polite=Elev
, you no longer usePolite=Form
.
OK, Thanks.
I have now added a politeness-informal function to the auxiliary specification form.
Thank you very much.
We cannot choose
Negation
in http://quest.ms.mff.cuni.cz/udvalidator/cgi-bin/unidep/langspec/specify_auxiliary.pl
Could you addNegation
in the tool?
It is there since the beginning, it is the seventeenth function: "Needed in negative clauses (like English “do”, not like “not”)". Note that if it is more like English not (as opposed to do in negative clauses), then it is not an
AUX
but aPART
(see here), and its relation to the predicate of the clause isadvmod
.
OK, I choose "Needed in negative clauses (like English “do”, not like “not”)".
However, the negation auxiliary verbs in the Uralic languages should be auxiliary in UD:
https://benjamins.com/catalog/tsl.108
I want to hear the opinions of other Uralic language people。
Speaking from a Turkic perspective, we'd treat a tokenized "not" negation in Turkic "ma" as "PART" akin to the question marker "mu" token. This choice is codified in Old Turkish as in the following paper: https://aclanthology.org/2021.udw-1.11/
It seems to me that there are no remaining open questions for the validation infrastructure in this issue, so I am tentatively closing it. Feel free to reopen if another action is needed.