XuGW-Kevin's Stars
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
facebookresearch/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
lllyasviel/ControlNet
Let us control diffusion models!
stanfordnlp/dspy
DSPy: The framework for programming—not prompting—language models
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
openai/shap-e
Generate 3D objects conditioned on text or images
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
facebookresearch/metaseq
Repo for external large-scale work
FoundationVision/VAR
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
LLaVA-VL/LLaVA-NeXT
eureka-research/Eureka
Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models" (ICLR 2024)
NVlabs/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
dvlab-research/LISA
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
zou-group/textgrad
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.
yunlong10/Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
ShareGPT4Omni/ShareGPT4Video
[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
uclaml/SPIN
The official implementation of Self-Play Fine-Tuning (SPIN)
corca-ai/awesome-llm-security
A curation of awesome tools, documents and projects about LLM Security.
Drexubery/ViewCrafter
Official implementation of "ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis"
PKU-YuanGroup/Chat-UniVi
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
eureka-research/DrEureka
Official Repository for "DrEureka: Language Model Guided Sim-To-Real Transfer" (RSS 2024)
baaivision/DIVA
Diffusion Feedback Helps CLIP See Better
nv-nguyen/nope
[CVPR 2024] PyTorch implementation of NOPE: Novel Object Pose Estimation from a Single Image
cilinyan/VISA
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
scicode-bench/SciCode
A benchmark that challenges language models to code solutions for scientific problems
ctogle/archigan
Data wrangling for pix2pix training of an ArchiGAN pipeline
jity16/ACE-Off-Policy-Actor-Critic-with-Causality-Aware-Entropy-Regularization
Official PyTorch implementation of "ACE:Off-Policy Actor-Critic with Causality-Aware Entropy Regularization"