leuchine
Machine Learning and Natural Language Processing Researcher
University of Hong Kong, Reka AIHong Kong
leuchine's Stars
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
openai/openai-python
The official Python library for the OpenAI API
dair-ai/ml-visuals
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
openai/spinningup
An educational resource to help anyone learn deep reinforcement learning.
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
cleanlab/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
speechbrain/speechbrain
A PyTorch-based Speech Toolkit
openai/jukebox
Code for the paper "Jukebox: A Generative Model for Music"
EleutherAI/gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
snorkel-team/snorkel
A system for quickly generating training data with weak supervision
yandex/YaLM-100B
Pretrained language model with 100B parameters
facebookresearch/fairscale
PyTorch extensions for high performance and large scale training.
huggingface/huggingface_hub
The official Python client for the Huggingface Hub.
EleutherAI/the-pile
facebookresearch/diplomacy_cicero
Code for Cicero, an AI agent that plays the game of Diplomacy with open-domain natural language negotiation.
microsoft/mup
maximal update parametrization (µP)
PKU-Alignment/safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
lucidrains/perceiver-pytorch
Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch
bigscience-workshop/bigscience
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
GEM-benchmark/NL-Augmenter
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations
LeoGrin/tabular-benchmark
bigscience-workshop/data-preparation
Code used for sourcing and cleaning the BigScience ROOTS corpus
huggingface/olm-datasets
Pipeline for pulling and processing online language model pretraining data from the web
lxuechen/private-transformers
A codebase that makes differentially private training of transformers easy.
reka-ai/reka-vibe-eval
Multimodal language model benchmark, featuring challenging examples
sebastianGehrmann/CausalMediationAnalysis
Code for the paper "Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias"
yzpang/gold-off-policy-text-gen-iclr21
UKPLab/nessie
Automatically detect errors in annotated corpora.
EleutherAI/pile-cc
mlcommons/dataperf
Data Benchmarking