niyatibafna

CS PhD at CLSP, JHU

Pinned Repositories

100LinesOfCode
Let's build something productive in less than 100 Lines of Code.
Language:Jupyter Notebook00
601.771
Language:Python00
BLI-for-Indic-languages
This is the code for our paper <put link here>
Language:Python10
character-bert
Main repository for "CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters"
Language:Python00
character-bert-pretraining
Code for pre-training CharacterBERT models (as well as BERT models).
Language:Python00
CSCBLI
Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"
Language:Python00
DeepLearningForShortStories
This is for the course Deep Learning for the Processing and Interpretation of Literary Texts
Language:Python00
Handling-English-VPE-for-English-Hindi-MT
English-Hindi machine translation systems have difficulty interpreting verb phrase ellipsis (VPE) in English, and commit errors in translating sentences with VPE. We present a solution and theoretical backing for the treatment of English VPE, with the specific scope of enabling English-Hindi MT, based on an understanding of the syntactical phenomenon of verb-stranding verb phrase ellipsis in Hindi (VVPE). We implement a rule-based system to perform the following sub-tasks: 1) Verb ellipsis identification in the English source sentence, 2) Elided verb phrase head identification 3) Identification of verb segment which needs to be induced at the site of ellipsis 4) Modify input sentence; i.e. resolving VPE and inducing the required verb segment. This system obtains 94.83 percent precision and 83.04 percent recall on subtask (1), tested on 3900 sentences from the BNC corpus [Leech, 1992]. This is competitive with state-of-the-art results. We measure accuracy of subtasks (2) and (3) together, and obtain a 91 percent accuracy on 200 sentences taken from the WSJ cor- pus[Paul and Baker, 1992]. We carried out a manual analysis of the MT outputs of 100 sentences after passing it through our system. We set up a basic metric (1-5) for this evaluation, where 5 indicates drastic improvement, and obtained an average of 3.55.
Language:Python11
north-indian-dialect-modelling
Collecting data for "dialects" in the North Indian "Hindi belt". Modelling the dialect system to gain insight and to develop NLP research for low-resource languages.
Language:Jupyter Notebook50
XORQA
This is the official repository for NAACL 2021, "XOR QA: Cross-lingual Open-Retrieval Question Answering".
Language:Python10

niyatibafna's Repositories

niyatibafna/north-indian-dialect-modelling
Collecting data for "dialects" in the North Indian "Hindi belt". Modelling the dialect system to gain insight and to develop NLP research for low-resource languages.
Language:Jupyter Notebook50
niyatibafna/BLI-for-Indic-languages
This is the code for our paper <put link here>
Language:Python10
niyatibafna/Handling-English-VPE-for-English-Hindi-MT
English-Hindi machine translation systems have difficulty interpreting verb phrase ellipsis (VPE) in English, and commit errors in translating sentences with VPE. We present a solution and theoretical backing for the treatment of English VPE, with the specific scope of enabling English-Hindi MT, based on an understanding of the syntactical phenomenon of verb-stranding verb phrase ellipsis in Hindi (VVPE). We implement a rule-based system to perform the following sub-tasks: 1) Verb ellipsis identification in the English source sentence, 2) Elided verb phrase head identification 3) Identification of verb segment which needs to be induced at the site of ellipsis 4) Modify input sentence; i.e. resolving VPE and inducing the required verb segment. This system obtains 94.83 percent precision and 83.04 percent recall on subtask (1), tested on 3900 sentences from the BNC corpus [Leech, 1992]. This is competitive with state-of-the-art results. We measure accuracy of subtasks (2) and (3) together, and obtain a 91 percent accuracy on 200 sentences taken from the WSJ cor- pus[Paul and Baker, 1992]. We carried out a manual analysis of the MT outputs of 100 sentences after passing it through our system. We set up a basic metric (1-5) for this evaluation, where 5 indicates drastic improvement, and obtained an average of 3.55.
Language:Python11
niyatibafna/XORQA
This is the official repository for NAACL 2021, "XOR QA: Cross-lingual Open-Retrieval Question Answering".
Language:Python10
niyatibafna/100LinesOfCode
Let's build something productive in less than 100 Lines of Code.
Language:Jupyter Notebook00
niyatibafna/601.771
Language:Python00
niyatibafna/character-bert
Main repository for "CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters"
Language:Python00
niyatibafna/character-bert-pretraining
Code for pre-training CharacterBERT models (as well as BERT models).
Language:Python00
niyatibafna/CSCBLI
Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"
Language:Python00
niyatibafna/DeepLearningForShortStories
This is for the course Deep Learning for the Processing and Interpretation of Literary Texts
Language:Python00
niyatibafna/Email-Clustering-on-the-Enron-Dataset
Project for CS-303. Working with public available dataset Enron (https://www.cs.cmu.edu/~./enron/), that contains approximately 0.5 million messages collected from 150 users, to model a classification.
Language:Jupyter Notebook02
niyatibafna/embeddings-transfer-indian-languages
Transferring embeddings to low resource Indian languages using close relationships to other higher resource languages such as Hindi, Bangla, Marathi, etc.
Language:Jupyter Notebook00
niyatibafna/gina
Learning a Hindi lexicon from parallel corpora. Monsoon 2018. Google Cloud NLP API.
Language:Python00
niyatibafna/HateSpeech-Hindi-English-Code-Mixed-Social-Media-Text
Language:Python00
niyatibafna/Hindi-Sentence-Completion
Cleaned final code from Hindi-Verb-Prediction
Language:Python00
niyatibafna/llm-eval-crosslingual-generalization
Language:Python
niyatibafna/lm_hf_skeleton
Skeleton scripts in HF.
Language:Python
niyatibafna/misc
Useful things
Language:Shell
niyatibafna/mlmm-evaluation
Multilingual Large Language Models Evaluation Benchmark
Language:Python
niyatibafna/mt_hf_skeleton
Setting up MT in HF
Language:Python
niyatibafna/OBPE
niyatibafna/pgns-for-lrmt
Language:Python
niyatibafna/political_health
This is for measuring hate on Twitter against certain groups, and comparing these metrics over time
Language:Python
niyatibafna/retaining-source-terms-nmt
When we are translating technical material from English to Hindi, we may often want to retain certain terminology for consistency and coherence in Hindi. This task deals with constrained decoding of English-Hindi NMT to accomplish this goal i.e. given source English text, and a list of English terms that we want to retain, we want the output in target language Hindi that uses the required English terminology.
Language:Jupyter Notebook

niyatibafna

Pinned Repositories

100LinesOfCode

601.771

BLI-for-Indic-languages

character-bert

character-bert-pretraining

CSCBLI

DeepLearningForShortStories

Handling-English-VPE-for-English-Hindi-MT

north-indian-dialect-modelling

XORQA

niyatibafna's Repositories

niyatibafna/north-indian-dialect-modelling

niyatibafna/BLI-for-Indic-languages

niyatibafna/Handling-English-VPE-for-English-Hindi-MT

niyatibafna/XORQA

niyatibafna/100LinesOfCode

niyatibafna/601.771

niyatibafna/character-bert

niyatibafna/character-bert-pretraining

niyatibafna/CSCBLI

niyatibafna/DeepLearningForShortStories

niyatibafna/Email-Clustering-on-the-Enron-Dataset

niyatibafna/embeddings-transfer-indian-languages

niyatibafna/gina

niyatibafna/HateSpeech-Hindi-English-Code-Mixed-Social-Media-Text

niyatibafna/Hindi-Sentence-Completion

niyatibafna/llm-eval-crosslingual-generalization

niyatibafna/lm_hf_skeleton

niyatibafna/misc

niyatibafna/mlmm-evaluation

niyatibafna/mt_hf_skeleton

niyatibafna/OBPE

niyatibafna/pgns-for-lrmt

niyatibafna/political_health

niyatibafna/retaining-source-terms-nmt