yihongL1U's Stars
McGill-NLP/llm2vec
Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
karpathy/build-nanogpt
Video+code lecture on building nanoGPT from scratch
april-tools/pixar
Repository of PIXAR, a Pixel-based Auto-Regressive Language Model
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
bminixhofer/zett
Code for Zero-Shot Tokenizer Transfer
FreddeFrallan/Multilingual-CLIP
OpenAI CLIP text encoders for multiple languages!
facebookresearch/llm-transparency-tool
LLM Transparency Tool (LLM-TT), an open-source interactive toolkit for analyzing internal workings of Transformer-based language models. *Check out demo at* https://huggingface.co/spaces/facebook/llm-transparency-tool-demo
VILA-Lab/ATLAS
A principled instruction benchmark on formulating effective queries and prompts for large language models (LLMs). Our paper: https://arxiv.org/abs/2312.16171
cisnlp/GlotLID
GlotLID: Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
dadelani/sib-200
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
kojima-takeshi188/lang_neuron
esalesky/visrep
This repository contains an extension of fairseq for pixel / visual representations for machine translation.
gowitheflow-1998/Pixel-Linguist
uclaml/SPIN
The official implementation of Self-Play Fine-Tuning (SPIN)
epfl-dlab/llm-latent-language
Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
boleima/ToPro
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks (EACL2024)
kaistAI/LangBridge
[ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervision
feizc/MLE-LLaMA
Multi-language Enhanced LLaMA
huggingface/deep-rl-class
This repo contains the syllabus of the Hugging Face Deep Reinforcement Learning Course.
eric-mitchell/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
tylerachang/multilingual-geometry
The geometry of multilingual language model representations (EMNLP 2022).
microsoft/COCO-LM
[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
UKPLab/sentence-transformers
State-of-the-Art Text Embeddings
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Shark-NLP/OpenICL
OpenICL is an open-source framework to facilitate research, development, and prototyping of in-context learning.
HKUNLP/multilingual-transfer
Code for paper ”Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability“
zjunlp/EasyEdit
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
cisnlp/cisnlp.github.io
Homepage of cisnlp
cisnlp/ColexificationNet
Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs
xplip/pixel
Research code for pixel-based encoders of language (PIXEL)