cvtower's Stars
ultralytics/ultralytics
Ultralytics YOLO11 🚀
karpathy/llm.c
LLM training in simple, raw C/CUDA
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
VikParuchuri/surya
OCR, layout analysis, reading order, table recognition in 90+ languages
state-spaces/mamba
Mamba SSM architecture
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
jzhang38/TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
roboflow/notebooks
This repository offers a comprehensive collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like YOLO11, RT-DETR, SAM 2, Florence-2, PaliGemma 2, and Qwen2.5VL.
Ucas-HaoranWei/GOT-OCR2.0
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Tencent/HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
johnma2006/mamba-minimal
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
instantX-research/InstantStyle
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation 🔥
apple/ml-4m
4M: Massively Multimodal Masked Modeling
lichao-sun/Mora
Mora: More like Sora for Generalist Video Generation
facebookresearch/MobileLLM
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
NVlabs/MambaVision
[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
bcmi/libcom
Image composition toolbox: everything you want to know about image composition or object insertion
kijai/ComfyUI-KwaiKolorsWrapper
Diffusers wrapper to run Kwai-Kolors model
ironjr/StreamMultiDiffusion
Official code for the paper "StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control."
lichao-sun/SoraReview
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".
kyegomez/VisionMamba
Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model" It's 2.8x faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on high-res images
ragavsachdeva/magi
Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR.
andimarafioti/florence2-finetuning
Quick exploration into fine tuning florence 2
Atten4Vis/LW-DETR
This repository is an official implementation of the paper "LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection".
maxin-cn/Cinemo
[CVPR 2025] Consistent and Controllable Image Animation with Motion Diffusion Models
TerryPei/EfficientVMamba
Code Implementation of EfficientVMamba
Zyphra/Zamba2
PyTorch implementation of models from the Zamba2 series.
zurutech/pillow-resize
Porting of Pillow resize method in C++ and OpenCV.
HaiyuWu/Vec2Face
This is the official implementation of "Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors", which is accepted at ICLR2025.
qingshi9974/ECCV2024-AdpatICMH
[ECCV2024] Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation