gentaiscool
Researcher @ Capital One AI Foundations. Natural Language Processing, Speech, Multilingual, Code-switching, Dialogue
Capital One AI FoundationsNew York
Pinned Repositories
NL-Augmenter
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations
code-switching-papers
A curated list of research papers and resources on code-switching
end2end-asr-pytorch
End-to-End Automatic Speech Recognition on PyTorch
few-shot-lm
The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)
indonesian-nlp
A curated list of research papers and resources on Indonesian languages
lstm-attention
Attention-based bidirectional LSTM for Classification Task (ICASSP)
ros-vrep-slam
ROS and V-REP for Robot Mapping and Localization
indonlu
The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)
nusa-crowd
A collaborative project to collect datasets in Indonesian languages.
nusax
High-quality parallel resource on sentiment analysis for 10 low-resource Indonesian languages, English, and Indonesian (Outstanding Paper at EACL 2023)
gentaiscool's Repositories
gentaiscool/code-switching-papers
A curated list of research papers and resources on code-switching
gentaiscool/end2end-asr-pytorch
End-to-End Automatic Speech Recognition on PyTorch
gentaiscool/lstm-attention
Attention-based bidirectional LSTM for Classification Task (ICASSP)
gentaiscool/few-shot-lm
The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)
gentaiscool/indonesian-nlp
A curated list of research papers and resources on Indonesian languages
gentaiscool/meta-emb
Multilingual Meta-Embeddings for Named Entity Recognition (RepL4NLP & EMNLP 2019)
gentaiscool/miners
MINERS ⛏️: The semantic retrieval benchmark for evaluating multilingual language models. (EMNLP 2024 Findings)
gentaiscool/matrix_fact
Matrix Factorization Library
gentaiscool/gentaiscool.github.io
My website
gentaiscool/distfuse
A library to calculate similarity scores between two collections of text sequences encoded using transformer models for bitext mining, dense retrieval, retrieval-based classification, and retrieval-augmented generation (RAG).
gentaiscool/xnli-dataset
gentaiscool/acl-anthology
Data and software for building the ACL Anthology.
gentaiscool/aclpub2
gentaiscool/awesome-cultural-nlp
Resources for cultural NLP research
gentaiscool/BIG-bench
Beyond the Imitation Game collaborative benchmark for enormous language models
gentaiscool/calcs2023
gentaiscool/calcs2023_ingest
gentaiscool/calcs2023_test
gentaiscool/DataLab
The unified platform for data-related resources.
gentaiscool/do-we-need-attention
gentaiscool/human-preference-papers
gentaiscool/LLaVA-NeXT
gentaiscool/lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
gentaiscool/mt-metrics-eval
Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.
gentaiscool/mteb
MTEB: Massive Text Embedding Benchmark
gentaiscool/NER-datasets
Datasets to train supervised classifiers for Named-Entity Recognition in different languages (Portuguese, German, Dutch, French, English)
gentaiscool/nusa-datasets
gentaiscool/opt-merging
gentaiscool/PromptPapers
Must-read papers on prompt-based tuning for pre-trained language models.
gentaiscool/promptsource
Toolkit for creating, sharing and using natural language prompts.