I started this project in order to combine my education in linguistics with my interest in data science. I find it fascinating that there are over 6000 languages in the world, each expressing the world view and cultural aspects of a group of people. However, due to various socio-political reasons, only a few languages dominate the global communication and are passed on to generations, while many others die out in time.
I set out to explore the data of endagered languages to answer the following questions:
- in what geographical areas are most endangered languages found?
- how many language families, languages, and dialects are endangered, and to what extent?
- Jupyter Notebook with the data analysis
- languoid.csv - dataset with the languages and degrees of endangerement
- languages-and-dialects-geo.csv - dataset with languages and dialects and their geographical coordinates
- datasets source