Psycholinguistic databases consist of sequences in the native language words, pseudowords, nonwords etc. with some features like abstractness, number of phones etc. Here are the definitions of the same:
- Words: Actual meaningful tokens in a language’s dictionary.
- Pseudowords: Tokens that sound like words, but don’t have a meaning assigned to them yet.
- Non words: Tokens which are neither a part of the lexicon nor sound like words in the language.
Pseudowords are of immense use in psycholinguistic batteries used to diagnose reading and language disabilities like Aphasia, Dyslexia, etc. They are useful in measuring how intact a patient’s semantic/lexical judgement abilities are. For instance, if they can correctly separate all pseudowords from words, then don’t have an aphasia which affected their semantic abilities.
Till now, the clinicians have relied on ad-hoc approaches to form pseudowords for use. But these might not conform to an individual’s understanding of the sounds in the native dialect of Hindi or can be unfair towards actual words that clinicians didn’t know of. Hence it is crucial to have a dependable method that can be used to generate Hindi Pseudowords.
In this project we aim to design and conduct experiments to validate pseudowords generated by a deep learning model which understands a language’s phonology, to validate their closeness to being an actual Hindi word according to native Hindi speakers. This is crucial for enhancing the reliability of this model and then expanding to other languages in the Indian subcontinent.