Pinned Repositories
bodo
This repository contains all the resources (corpora) of Bodo and tools that were developed for creating and managing these resources
hindi-politeness
indianlr.github.io
A repository for listing the non-scheduled and endangered Indian language resources and technologies. The website could be accessed here
kmi-linguistics.github.io
Research and Development at the Department of Linguistics in K.M. Institute of Hindi and Linguistics at Dr. Bhim Rao Ambedkar University, Agra
magahi
This repository contains all the data, tools, applications and publications related to Magahi, an Indo-Aryan language
mscrabble
Repository for Multilingual Scrabble Generator and Games - especially aimed towards endangered languages
propaganda
Repository of the data and models generated by Mr. Shyam Ratan as part of his MPhil dissrtation titled 'Automatic Detection Of Propaganda In Hindi On Social Media'
SpeeD-IA
Repository for different Speech Datasets and Models for Indo-Aryan languages prepared by the Department under different projects
trac-1
Repository hosting dataset for the Shared Task on Aggression Identification during First Workshop on Trolling, Aggression and Cyberbullying (TRAC - 1) as COLING - 2018. Please visit the workshop website - https://sites.google.com/view/trac1/home - for more details
vardial2018
This repository contains the dataset used for Indo-Aryan Language identitifcation Shared Task as part of the Evaluation Campaign in the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial) at COLING 2018. It has 15k sentences each in Awadhi, Bhojpuri, Braj, Magahi and Hindi
Department of Linguistics, K.M. Institute of Hindi and Linguistics's Repositories
kmi-linguistics/trac-1
Repository hosting dataset for the Shared Task on Aggression Identification during First Workshop on Trolling, Aggression and Cyberbullying (TRAC - 1) as COLING - 2018. Please visit the workshop website - https://sites.google.com/view/trac1/home - for more details
kmi-linguistics/vardial2018
This repository contains the dataset used for Indo-Aryan Language identitifcation Shared Task as part of the Evaluation Campaign in the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial) at COLING 2018. It has 15k sentences each in Awadhi, Bhojpuri, Braj, Magahi and Hindi
kmi-linguistics/magahi
This repository contains all the data, tools, applications and publications related to Magahi, an Indo-Aryan language
kmi-linguistics/bodo
This repository contains all the resources (corpora) of Bodo and tools that were developed for creating and managing these resources
kmi-linguistics/hindi-politeness
kmi-linguistics/mscrabble
Repository for Multilingual Scrabble Generator and Games - especially aimed towards endangered languages
kmi-linguistics/bhojpuri
Resources and Technologies for Bhojpuri
kmi-linguistics/braj
Repository for all codes, data and resources on Braj Bhasha that is being developed at the Institute.
kmi-linguistics/ComMA
kmi-linguistics/indianlr.github.io
A repository for listing the non-scheduled and endangered Indian language resources and technologies. The website could be accessed here
kmi-linguistics/kmi-linguistics.github.io
Research and Development at the Department of Linguistics in K.M. Institute of Hindi and Linguistics at Dr. Bhim Rao Ambedkar University, Agra
kmi-linguistics/NLP
Natural Language Processing R&D @K.M. Institute of Hindi and Linguistics
kmi-linguistics/propaganda
Repository of the data and models generated by Mr. Shyam Ratan as part of his MPhil dissrtation titled 'Automatic Detection Of Propaganda In Hindi On Social Media'
kmi-linguistics/SpeeD-IA
Repository for different Speech Datasets and Models for Indo-Aryan languages prepared by the Department under different projects
kmi-linguistics/awadhi
Repository for all codes, data and resources on Awadhi language that is being developed at the Institute. Currently, it contains all the data generated as part of the M.Phil. dissertation of Mr. Abdul Basit.
kmi-linguistics/Code-mixing
kmi-linguistics/crawlers
kmi-linguistics/indianlr
A repository of language resources and technologies for non-scheduled and endangered Indian languages
kmi-linguistics/sigtyp2020
This repository contains code and details of the KMI-Panlingua-IITKGP system submitted to the SigTyp 2020 Shared Task on Prediction of Linguistic Features. It could be used for training and prediction on any new dataset in the same format with similar information.
kmi-linguistics/speech-aggression
Repository of data and scripts of UGC-UKIERI Project on "Automatic Detection of Verbal Threat in HIndi and English Aggressive Speech"
kmi-linguistics/taluitew
Repository for all data and resources on Taluitew, a Tibeto-Burman language of Naga Group, spoken in parts of Manipur that is being developed at the Institute. Currently, it contains all the data generated as part of the M.Phil. dissertation of Mr. Chingrimung Lungleng.
kmi-linguistics/text-aggression
This is the repository of the aggression project carried out as part of the The Aggression Project at the Microsoft Research India Summer Workshop on Artificial Social Intelligence in June 2017. The repository contains all codes and datasets generated during the school.
kmi-linguistics/trac-2
Repository hosting dataset for the Shared Task on Aggression and Misogyny Identification during Second Workshop on Trolling, Aggression and Cyberbullying (TRAC - 2) as LREC-2020. Please visit the workshop website - https://sites.google.com/view/trac2/shared-task - for more details
kmi-linguistics/western-hindi
Repository for all data and resources on Western Hindi that is being developed at the Institute. Currently, it contains all the data generated as part of the M.Phil. dissertation of Ms. Saba Parween.