power0341's Stars
OpenDevin/OpenDevin
🐚 OpenDevin: Code Less, Make More
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
princeton-nlp/SWE-agent
[NeurIPS 2024] SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges.
FoundationVision/VAR
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
Zelda64Recomp/Zelda64Recomp
Static recompilation of Majora's Mask (and soon Ocarina of Time) for PC (Windows/Linux)
LLaVA-VL/LLaVA-NeXT
pytorch/torchtitan
A native PyTorch Library for large model training
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
microsoft/aici
AICI: Prompts as (Wasm) Programs
TencentARC/BrushNet
[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"
SakanaAI/evolutionary-model-merge
Official repository of Evolutionary Optimization of Model Merging Recipes
OpenMOSS/AnyGPT
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
google-deepmind/recurrentgemma
Open weights language model from Google DeepMind, based on Griffin.
FoundationVision/Groma
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
google-deepmind/long-form-factuality
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
thuml/depyf
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
XuezheMax/megalodon
Reference implementation of Megalodon 7B model
AILab-CVC/SEED-X
Multimodal Models in Real World
pkunlp-icler/FastV
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
h-zhao1997/cobra
Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference
cohere-ai/magikarp
Code for the paper "Fishing for Magikarp"
jefferyZhan/Griffon
Official repo of Griffon series including v1(ECCV 2024), v2, and G
syp2ysy/VRP-SAM
[CVPR 2024] Official implementation of "VRP-SAM: SAM with Visual Reference Prompt"
facebookresearch/UnSAMFlow
Source code for CVPR 2024 paper UnSAMFlow Unsupervised Optical Flow Guided by Segment Anything Model.
syarahmadi/transformers-crash-course
A collection of tutorials and notebooks explaining transformer models in deep learning.
fcjian/InstaGen
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset, CVPR2024
vislearn/FFF
Free-form flows are a generative model training a pair of neural networks via maximum likelihood
alibaba/AICITY2024_Track2_AliOpenTrek_CityLLaVA
vislearn/Coupling-Universality
AIRI-Institute/LLM-Microscope