Pinned Repositories
car_learning
Deep Deterministic Policy Gradient
face_swap
Troca de rostos - Face Swap
filtros_importantes
Monkey
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024)
reconhecimento_facial
Simples reconhecimento facial
Video-LLaMA
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
VisionLLM
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Voice-Identification
Project to explore Speaker and Voice Identification. To follow will be further Speech Recognition tasks.
whisper
Robust Speech Recognition via Large-Scale Weak Supervision
YOLOX
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/
saulocatharino's Repositories
saulocatharino/Monkey
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024)
saulocatharino/DeepSeek-Coder
DeepSeek Coder: Let the Code Write Itself
saulocatharino/OLMo
Modeling, training, eval, and inference code for OLMo
saulocatharino/CSVsniffer
saulocatharino/DynamiCrafter
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
saulocatharino/InternLM-XComposer
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
saulocatharino/DE-COP_Method
This repository presents the original implementation of DE-COP: Detecting Copyrighted Content in Language Models Training Data by André V. Duarte, Xuandong Zhao, Arlindo L. Oliveira and Lei Li
saulocatharino/PPG
saulocatharino/UFO
A UI-Focused Agent for Windows OS Interaction.
saulocatharino/AnimateLCM
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning
saulocatharino/AniTalker
saulocatharino/browserless
Deploy headless browsers in Docker. Run on our cloud or bring your own. Free for non-commercial uses.
saulocatharino/CoMoSpeech
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
saulocatharino/dust3r
saulocatharino/Fracture_Detection_Improved_YOLOv8
YOLOv8-AM: YOLOv8 with Attention Mechanisms for Pediatric Wrist Fracture Detection
saulocatharino/GaussianTalker
saulocatharino/Groma
Grounded Multimodal Large Language Model with Localized Visual Tokenization
saulocatharino/IDM-VTON
IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
saulocatharino/InstantMesh
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
saulocatharino/LWM
saulocatharino/Mamba-UNet
Mamba-UNet: Unet-like Pure Visual Mamba for Medical Image Segmentation
saulocatharino/Metric3D
The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
saulocatharino/mickey
[CVPR 2024 - Oral] Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences
saulocatharino/NATTEN
Neighborhood Attention Extension. Bringing attention to a neighborhood near you!
saulocatharino/OOTDiffusion
Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
saulocatharino/SPIN
The official implementation of Self-Play Fine-Tuning (SPIN)
saulocatharino/StoryDiffusion
Create Magic Story!
saulocatharino/StreamingT2V
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
saulocatharino/UAV-Rain1k
UAV-Rain1k: A Benchmark for Raindrop Removal from UAV Aerial Imagery
saulocatharino/whisper-asr-webservice
OpenAI Whisper ASR Webservice API