gentaiscool

Researcher @ Capital One AI Foundations. Natural Language Processing, Speech, Multilingual, Code-switching, Dialogue

Capital One AI FoundationsNew York

Pinned Repositories

NL-Augmenter
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations
Language:Python778 23 52196
code-switching-papers
A curated list of research papers and resources on code-switching
303 24 638
end2end-asr-pytorch
End-to-End Automatic Speech Recognition on PyTorch
Language:Python294 12 4062
few-shot-lm
The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)
Language:Python52 5 02
indonesian-nlp
A curated list of research papers and resources on Indonesian languages
39 6 03
lstm-attention
Attention-based bidirectional LSTM for Classification Task (ICASSP)
Language:Python111 7 243
ros-vrep-slam
ROS and V-REP for Robot Mapping and Localization
Language:C++43 8 113
indonlu
The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)
Language:Jupyter Notebook565 18 36196
nusa-crowd
A collaborative project to collect datasets in Indonesian languages.
Language:Jupyter Notebook263 6 19162
nusax
High-quality parallel resource on sentiment analysis for 10 low-resource Indonesian languages, English, and Indonesian (Outstanding Paper at EACL 2023)
Language:Jupyter Notebook95 9 010

gentaiscool's Repositories

gentaiscool/code-switching-papers
A curated list of research papers and resources on code-switching
303 24 638
gentaiscool/end2end-asr-pytorch
End-to-End Automatic Speech Recognition on PyTorch
Language:Python294 12 4062
gentaiscool/lstm-attention
Attention-based bidirectional LSTM for Classification Task (ICASSP)
Language:Python111 7 243
gentaiscool/few-shot-lm
The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)
Language:Python52 5 02
gentaiscool/indonesian-nlp
A curated list of research papers and resources on Indonesian languages
39 6 03
gentaiscool/meta-emb
Multilingual Meta-Embeddings for Named Entity Recognition (RepL4NLP & EMNLP 2019)
Language:Python32 5 13
gentaiscool/miners
MINERS ⛏️: The semantic retrieval benchmark for evaluating multilingual language models. (EMNLP 2024 Findings)
Language:Python11 3 16
gentaiscool/matrix_fact
Matrix Factorization Library
Language:Python9 4 02
gentaiscool/gentaiscool.github.io
My website
Language:HTML7 3 03
gentaiscool/distfuse
A library to calculate similarity scores between two collections of text sequences encoded using transformer models for bitext mining, dense retrieval, retrieval-based classification, and retrieval-augmented generation (RAG).
Language:Python5 2 13
gentaiscool/xnli-dataset
Language:Python1 3 0
gentaiscool/acl-anthology
Data and software for building the ACL Anthology.
Language:Python2 0
gentaiscool/aclpub2
Language:TeX1 0
gentaiscool/awesome-cultural-nlp
Resources for cultural NLP research
gentaiscool/BIG-bench
Beyond the Imitation Game collaborative benchmark for enormous language models
Language:Python2 0
gentaiscool/calcs2023
Language:Python3 0
gentaiscool/calcs2023_ingest
Language:TeX2 0
gentaiscool/calcs2023_test
Language:Python2 1
gentaiscool/DataLab
The unified platform for data-related resources.
Language:Python2 0
gentaiscool/do-we-need-attention
Language:TeX1 0
gentaiscool/human-preference-papers
1 0
gentaiscool/LLaVA-NeXT
Language:Python0 0
gentaiscool/lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
Language:Python2 0
gentaiscool/mt-metrics-eval
Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.
Language:Python0 0
gentaiscool/mteb
MTEB: Massive Text Embedding Benchmark
Language:Python0 0
gentaiscool/NER-datasets
Datasets to train supervised classifiers for Named-Entity Recognition in different languages (Portuguese, German, Dutch, French, English)
Language:Python2 0
gentaiscool/nusa-datasets
Language:Python2 0
gentaiscool/opt-merging
Language:Python
gentaiscool/PromptPapers
Must-read papers on prompt-based tuning for pre-trained language models.
2 0
gentaiscool/promptsource
Toolkit for creating, sharing and using natural language prompts.
Language:Python2 0