oscmansan's Stars
xai-org/grok-1
Grok open release
lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
stas00/ml-engineering
Machine Learning Engineering Open Book
huggingface/trl
Train transformer language models with reinforcement learning.
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
ml-explore/mlx-examples
Examples in the MLX framework
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
pytorch/torchtune
PyTorch native post-training library
openai/grok
openai/transformer-debugger
apple/ml-mgie
facebookresearch/jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
NVlabs/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
stanford-crfm/helm
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
facebookresearch/MetaCLIP
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
unum-cloud/uform
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
merveenoyan/smol-vision
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
stas00/the-art-of-debugging
The Art of Debugging
cloneofsimo/minSDXL
Huggingface-compatible SDXL Unet implementation that is readily hackable
huggingface/open-muse
Open reproduction of MUSE for fast text2image generation.
linzhiqiu/t2v_metrics
Evaluating text-to-image/video/3D models with VQAScore
NVIDIA/Megatron-Energon
Megatron's multi-modal data loader
YiyangZhou/LURE
[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
j-min/DSG
Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)
YiyangZhou/POVID
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
openai/dalle3-eval-samples
Text-to-image samples collected for the evaluation of DALL-E 3 in the whitepaper.
YiyangZhou/CSR
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
DavidMChan/aloha
A new reliable, localizable, and generalizable metric for hallucination detection in image captioning models.