saulocatharino

Consultor em Visão Computacional.

Beet LabsRio de janeiro

Pinned Repositories

car_learning
Deep Deterministic Policy Gradient
Language:Python13 1 01
face_swap
Troca de rostos - Face Swap
Language:Python14 5 08
filtros_importantes
Language:Python26 4 06
Monkey
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models （CVPR 2024)
Language:Python103
reconhecimento_facial
Simples reconhecimento facial
Language:Python38 4 011
Video-LLaMA
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Language:Python00
VisionLLM
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
00
Voice-Identification
Project to explore Speaker and Voice Identification. To follow will be further Speech Recognition tasks.
Language:Jupyter Notebook10
whisper
Robust Speech Recognition via Large-Scale Weak Supervision
Language:Python00
YOLOX
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/
Language:Python0 1 00

saulocatharino's Repositories

saulocatharino/Monkey
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models （CVPR 2024)
Language:Python103
saulocatharino/DeepSeek-Coder
DeepSeek Coder: Let the Code Write Itself
4
saulocatharino/OLMo
Modeling, training, eval, and inference code for OLMo
4
saulocatharino/CSVsniffer
Language:Python3 0 0
saulocatharino/DynamiCrafter
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
2
saulocatharino/InternLM-XComposer
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
Language:Python2 0 0
saulocatharino/DE-COP_Method
This repository presents the original implementation of DE-COP: Detecting Copyrighted Content in Language Models Training Data by André V. Duarte, Xuandong Zhao, Arlindo L. Oliveira and Lei Li
1
saulocatharino/PPG
Language:Python1 1 0
saulocatharino/UFO
A UI-Focused Agent for Windows OS Interaction.
1
saulocatharino/AnimateLCM
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning
0 0
saulocatharino/AniTalker
saulocatharino/browserless
Deploy headless browsers in Docker. Run on our cloud or bring your own. Free for non-commercial uses.
saulocatharino/CoMoSpeech
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
saulocatharino/dust3r
saulocatharino/Fracture_Detection_Improved_YOLOv8
YOLOv8-AM: YOLOv8 with Attention Mechanisms for Pediatric Wrist Fracture Detection
saulocatharino/GaussianTalker
saulocatharino/Groma
Grounded Multimodal Large Language Model with Localized Visual Tokenization
Language:Python0 0
saulocatharino/IDM-VTON
IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
saulocatharino/InstantMesh
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
Language:Python0 0
saulocatharino/LWM
saulocatharino/Mamba-UNet
Mamba-UNet: Unet-like Pure Visual Mamba for Medical Image Segmentation
Language:Python0 0
saulocatharino/Metric3D
The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
saulocatharino/mickey
[CVPR 2024 - Oral] Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences
saulocatharino/NATTEN
Neighborhood Attention Extension. Bringing attention to a neighborhood near you!
saulocatharino/OOTDiffusion
Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
saulocatharino/SPIN
The official implementation of Self-Play Fine-Tuning (SPIN)
saulocatharino/StoryDiffusion
Create Magic Story!
saulocatharino/StreamingT2V
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Language:Python0 0
saulocatharino/UAV-Rain1k
UAV-Rain1k: A Benchmark for Raindrop Removal from UAV Aerial Imagery
Language:Python0 0
saulocatharino/whisper-asr-webservice
OpenAI Whisper ASR Webservice API
Language:Python0 0