ropensci/lingtypology

Affiliation (and other) data missing

borstell opened this issue · 4 comments

I re-ran a script using lingtypology that worked about three months ago, but now it returned errors as some iso codes in my data no longer returned affiliation data (only "N/A"). The affiliation data are indeed missing from the R files glottolog.original and glottolog.modified, even though they are in the Glottolog database online.

The languages missing affiliation data in my set are (in iso codes):
heb, jup, eto, kal, sme, mrj, est, nbl

Any idea why these are missing?

Calle! Nice to read you again!

I've changed the algorithm for database creation. So something get wrong. What result have you expected?

> packageVersion("lingtypology")
[1] ‘1.0.6’
> lang.iso(c("heb", "jup", "eto", "kal", "sme", "mrj", "est", "nbl"))
               heb                jup                eto                kal                sme 
   "Modern Hebrew"            "Hupda"     "Eton-Mengisa"      "Kalaallisut"    "Northern Sami" 
               mrj                est                nbl 
    "Western Mari"                 NA "Sumayela Ndebele" 

BTW, what version of lingtypology do you use?

If by est you meant Estonian, in glottolog it has the ISO-code ekk

So, what I get is this:

> packageVersion("lingtypology")
[1] ‘1.0.6’
> aff.lang(lang.iso(c("heb", "jup", "eto", "kal", "sme", "mrj", "ekk", "nbl")))
heb jup eto kal sme mrj ekk nbl 
 NA  NA  NA  NA  NA  NA  NA  NA

In my script, I have a loop going through the 44 languages in my sample, and for each one it collects the aff.lang() string, splits the string on ",", and retrieves the first item. This is to get the top-level family for each language. Since the affiliation data are now missing from the above mentioned languages (even changing est > ekk), there is missing data in the output.

Also, sign languages seem to no longer have "Sign language" as the top level, but rather only listed as "Deaf sign language" (but this is not a big problem for my classification, although it also shows that something is missing from the Glottolog source).

I've spent a lot of time searching for source of this bug, but I can't find any. But I rewrote one thing, so it starts working.

> aff.lang(lang.iso(c("heb", "jup", "eto", "kal", "sme", "mrj", "ekk", "nbl")))
                                                                                                                       heb 
                                                                        "Afro-Asiatic, Semitic, Central, South, Canaanite" 
                                                                                                                       jup 
                                                                                                        "Puinavean, Hupda" 
                                                                                                                       eto 
"Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Northwest, A, Yaunde-Fang (A.71)" 
                                                                                                                       kal 
                                                                                     "Eskimo-Aleut, Eskimo, Inuit-Inupiaq" 
                                                                                                                       sme 
                                                                                         "Uralic, Sami, Western, Northern" 
                                                                                                                       mrj 
                                                                                                            "Uralic, Mari" 
                                                                                                                       ekk 
                                                                                                          "Uralic, Finnic" 
                                                                                                                       nbl 
"Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, S, Sotho-Tswana (S.407)" 

It still has some strange side effects, but I'll repair them later.

Thanks a bunch! It works great now again!