cvtower

compute vision algorithm engineer cuda&ocl developer since 2010.

shanghai,china

cvtower's Stars

ultralytics/ultralytics
Ultralytics YOLO11 🚀
Language:Python39.2k 190 10.8k7.6k
karpathy/llm.c
LLM training in simple, raw C/CUDA
Language:Cuda26.3k 271 1473k
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
Language:Python26.1k 206 5802.5k
VikParuchuri/surya
OCR, layout analysis, reading order, table recognition in 90+ languages
Language:Python17.1k 115 2071.1k
state-spaces/mamba
Mamba SSM architecture
Language:Python14.5k 106 6321.3k
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Language:Python13.1k 107 617914
jzhang38/TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Language:Python8.4k 112 165526
roboflow/notebooks
This repository offers a comprehensive collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like YOLO11, RT-DETR, SAM 2, Florence-2, PaliGemma 2, and Qwen2.5VL.
Language:Jupyter Notebook7.5k 98 1561.2k
Ucas-HaoranWei/GOT-OCR2.0
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Language:Python7.4k 61 259648
Tencent/HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Language:Jupyter Notebook4k 42 197336
johnma2006/mamba-minimal
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
Language:Python2.8k 25 28206
instantX-research/InstantStyle
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation 🔥
Language:Jupyter Notebook1.9k 20 43114
apple/ml-4m
4M: Massively Multimodal Masked Modeling
Language:Python1.7k 33 27103
lichao-sun/Mora
Mora: More like Sora for Generalist Video Generation
Language:Python1.6k 80 12103
facebookresearch/MobileLLM
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
Language:Python1.3k 21 1571
NVlabs/MambaVision
[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Language:Python1.3k 15 6263
bcmi/libcom
Image composition toolbox: everything you want to know about image composition or object insertion
Language:Python610 14 3943
kijai/ComfyUI-KwaiKolorsWrapper
Diffusers wrapper to run Kwai-Kolors model
Language:Python588 8 4234
ironjr/StreamMultiDiffusion
Official code for the paper "StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control."
Language:Jupyter Notebook532 10 1444
lichao-sun/SoraReview
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".
496 9 220
kyegomez/VisionMamba
Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model" It's 2.8x faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on high-res images
Language:Python447 6 1921
ragavsachdeva/magi
Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR.
342 5 913
andimarafioti/florence2-finetuning
Quick exploration into fine tuning florence 2
Language:Jupyter Notebook305 4 2428
Atten4Vis/LW-DETR
This repository is an official implementation of the paper "LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection".
Language:Python298 13 3125
maxin-cn/Cinemo
[CVPR 2025] Consistent and Controllable Image Animation with Motion Diffusion Models
Language:Python266 6 1521
TerryPei/EfficientVMamba
Code Implementation of EfficientVMamba
Language:Python204 2 257
Zyphra/Zamba2
PyTorch implementation of models from the Zamba2 series.
Language:Python178 4 316
zurutech/pillow-resize
Porting of Pillow resize method in C++ and OpenCV.
Language:C++132 12 1316
HaiyuWu/Vec2Face
This is the official implementation of "Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors", which is accepted at ICLR2025.
Language:Python54 3 133
qingshi9974/ECCV2024-AdpatICMH
[ECCV2024] Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation
41 4 50

cvtower

cvtower's Stars

ultralytics/ultralytics

karpathy/llm.c

hpcaitech/Open-Sora

VikParuchuri/surya

state-spaces/mamba

OpenBMB/MiniCPM-V

jzhang38/TinyLlama

roboflow/notebooks

Ucas-HaoranWei/GOT-OCR2.0

Tencent/HunyuanDiT

johnma2006/mamba-minimal

instantX-research/InstantStyle

apple/ml-4m

lichao-sun/Mora

facebookresearch/MobileLLM

NVlabs/MambaVision

bcmi/libcom

kijai/ComfyUI-KwaiKolorsWrapper

ironjr/StreamMultiDiffusion

lichao-sun/SoraReview

kyegomez/VisionMamba

ragavsachdeva/magi

andimarafioti/florence2-finetuning

Atten4Vis/LW-DETR

maxin-cn/Cinemo

TerryPei/EfficientVMamba

Zyphra/Zamba2

zurutech/pillow-resize

HaiyuWu/Vec2Face

qingshi9974/ECCV2024-AdpatICMH