DDMAL/VIM

Update Languages for "add new instrument name" feature

kunfang98927 opened this issue · 7 comments

Before I complete the "add new instrument name" feature. I have a question about updating language list in UMIL.

Based on my design, users can add new names to an instrument in a modal like this:

image

After the "publish" button is clicked, "wikidata_id" of the instrument, "language_code", "name", "source" will be saved to database. A new instrument name will be created by the following script in views/instrument_list.py:

InstrumentName.objects.create(
                    instrument=instrument,
                    language=language,
                    name=name,
                    source_name=source,
                )

According to our model design in models/instrument_name.py,

class InstrumentName(models.Model):
    instrument = models.ForeignKey("Instrument", on_delete=models.CASCADE)
    language = models.ForeignKey("Language", on_delete=models.PROTECT)
    name = models.CharField(max_length=100, blank=False)
    source_name = models.CharField(
        max_length=50, blank=False, help_text="Who or what called the instrument this?"
    )  # Stand-in for source data; format TBD

we should always choose a "instrument" and "language" from our database when publishing new instrument names. However, currently we only have two languages, "English" and "French", in our database. So should we synchronize as many languages as we can with Wikidata, or should we have our own language list (can be a subset of wikidata's language list) in UMIL so that users can choose from the list when adding new names for a instrument? @fujinaga @dchiller

A few comments:

  1. English and French were just chosen as initial options so I think we should feel free to add more now that are adding functionality to add more names.

  2. There is a set list of languages that can be used for the "names" of Q objects in Wikidata, so we could just add that set list to the database and periodically update if/as new options are added to Wikidata.

  3. Do we want people to be able to add languages not in Wikidata?

It seems that at the very least all languages in Wikidata should be supported.

Thank you for the comments.

3. Do we want people to be able to add languages not in Wikidata?

No. I think using wikidata's language list is enough for us.

@dchiller @fujinaga For the question "which languages are supported for adding item name to Wikidata", I haven't figured out the exact answer. Here is some other possible ways to get a language list.

  1. I found a language table in https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all
    In this language table there are 714 unique QIDs (the third column). But one thing confusing is that which column is the "language code" we should use. It seems that the first column may be the most possible. But I found that the code in first column is not always unique. For example:
image It seems that both "als" and "gsw" are language codes for "Alemannic" but within different code system: image

So I think if we are going to use this language table, we can just copy it and clean the data as we want.

  1. We can use the Wikidata API to get a language list. A possible way suggested by ChatGPT is: https://www.wikidata.org/w/api.php?action=query&meta=siteinfo&siprop=languages&format=json. This can return a list of languages (600 items) with their respective language codes.

Between these two methods, I prefer the first one which is to create our own clean language table.

Method 1 is fine.
We should always use ISO language codes.
Have you looked at this? https://www.mediawiki.org/wiki/Extension:UniversalLanguageSelector
When asking for the name (should be called "label" as in Wikidata, so "Name/Label"(?)), make sure you ask for the Description and optionally "Also known as".

For the question "which languages are supported for adding item name to Wikidata", I haven't figured out the exact answer

I also found this incredibly difficult to definitively determine. The results of my research is in #27 -- you found some of the same ways I did.

It seems that both "als" and "gsw" are language codes for "Alemannic" but within different code system.

I actually think this is a case of Wikidata being wrong and therefore maybe a point against this table (because it relies on the contents of Wikidata to populate). In the Universal Language Selector, als looks like it refers to a dialect of Albanian.

This is also maybe a useful tool (there's a link to the codebase which we could potentially pull from): https://codelookup.toolforge.org/

This is also maybe a useful tool (there's a link to the codebase which we could potentially pull from): https://codelookup.toolforge.org/

So I plan to copy the table from https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all and use this code look up tool for filtering out all invalid language codes. Then we can update and filter our own table periodically.