yaya-sy
PhD Researcher in NLP. Interested in building stuffs for under-represented languages.
ENS - LSCPParis
Pinned Repositories
SONAR
SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.
Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
blog
Public repo for HF blog posts
sglang
SGLang is a fast serving framework for large language models and vision language models.
BambaraFrenchBitexts
Creation of Bambara-French bitexts for NLP applications.
EntropyBasedCLDMetrics
Github repository accompanying the paper "Measuring language development from child-centered recordings"
FulfuldeTranslatorTwitterBot
lillama
[NAACL' 25 main] Lillama: Large Language Model Compression via Low-Rank Feature Distillation
speechscorer
unsupervised spoken utterances scoring
TsimaneForcedAligner
A forced aligner for Tsimane language
yaya-sy's Repositories
yaya-sy/speechscorer
unsupervised spoken utterances scoring
yaya-sy/lillama
[NAACL' 25 main] Lillama: Large Language Model Compression via Low-Rank Feature Distillation
yaya-sy/BambaraFrenchBitexts
Creation of Bambara-French bitexts for NLP applications.
yaya-sy/EntropyBasedCLDMetrics
Github repository accompanying the paper "Measuring language development from child-centered recordings"
yaya-sy/FulfuldeTranslatorTwitterBot
yaya-sy/TsimaneForcedAligner
A forced aligner for Tsimane language
yaya-sy/Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
yaya-sy/BenchmarkLangAcq
Behavioral probing of language acquisition models at the phonetic, lexical and syntactic level
yaya-sy/blog
Public repo for HF blog posts
yaya-sy/ChildDirectedSyntacticTest
yaya-sy/datasets-CMU_Wilderness
CMU Wilderness Multilingual Speech Dataset
yaya-sy/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
yaya-sy/FulaLanguageModel
yaya-sy/FulaPresentation
yaya-sy/GenerateFromLAE
yaya-sy/lm-sys.github.io
yaya-sy/MeasuringWordOrderFreedom
yaya-sy/mergekit
Tools for merging pretrained large language models.
yaya-sy/mteb
MTEB: Massive Text Embedding Benchmark
yaya-sy/NGramLM
A minimal implementation of a Ngram Language Model
yaya-sy/nlp4all
yaya-sy/scattermoe
Triton-based implementation of Sparse Mixture of Experts.
yaya-sy/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
yaya-sy/sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
yaya-sy/SONAR
SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.
yaya-sy/SpeechAya
yaya-sy/stopes
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
yaya-sy/WolofPOSTagger
yaya-sy/yaya-sy
yaya-sy/yaya-sy.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes