Metokarski's Stars
upscayl/upscayl
🆙 Upscayl - #1 Free and Open Source AI Image Upscaler for Linux, MacOS and Windows.
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
roboflow/supervision
We write your reusable computer vision tools. 💜
facebookresearch/segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
kyutai-labs/moshi
deepseek-ai/DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
DepthAnything/Depth-Anything-V2
Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
isl-org/ZoeDepth
Metric depth estimation from a single image
bghira/SimpleTuner
A general fine-tuning kit geared toward diffusion models.
aiola-lab/whisper-medusa
Whisper with Medusa heads
IDEA-Research/Motion-X
[NeurIPS 2023] Official implementation of the paper "Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset"
ai-forever/Real-ESRGAN
PyTorch implementation of Real-ESRGAN model
apple/ml-sigmoid-attention
VIPL-Audio-Visual-Speech-Understanding/LipNet-PyTorch
The state-of-art PyTorch implementation of the method described in the paper "LipNet: End-to-End Sentence-level Lipreading" (https://arxiv.org/abs/1611.01599)
Lornatang/ESRGAN-PyTorch
A simple implementation of esrgan, which uses the pytorch framework.
chenzhuo1011/libri_css
Libri-CSS: dataset and evaluation pipeline
facebookresearch/ava-256
Train universal codec avatars
sailordiary/LipNet-PyTorch
"LipNet: End-to-End Sentence-level Lipreading" in PyTorch
KoljaB/WhoSpeaks
Efficient approach to speaker diarization using voice characteristics extraction
huggingface/fineVideo
wizenheimer/cyyrus
Transform Unstructured Data into Synthetic Datasets
dimtzionas/HandObjectInteractionIJCV16_HandMotionViewer
Hand MoCap 3d viewer for the IJCV'16 paper "Capturing Hands in Action using Discriminative Salient Points and Physics Simulation"
jack-tol/youtube-to-audio
A lightweight Python package and command-line interface (CLI) tool that extracts audio from YouTube videos and playlists in multiple formats, such as MP3, WAV, OGG, AAC, and FLAC.
ffeew/LipCoordNet
A multi-modal neural network built upon LipNet that achieves SOTA performance on the GRID corpus
dimtzionas/HandObjectInteractionIJCV16_GroundTruthViewer
Ground-truth viewer for the IJCV'16 paper "Capturing Hands in Action using Discriminative Salient Points and Physics Simulation"
cvlabbonn/hand_2d_gt_viewer
A tool to view the data set distributed freely by Dimitris Tzionas
cvlabbonn/hands_3d_motion_viewer
A tool to view the data set distributed freely by Dimitris Tzionas
PingYufeng/LipNet-PyTorch-1
PyTorch implementation of the method described in the paper "LipNet: End-to-End Sentence-level Lipreading" (https://arxiv.org/abs/1611.01599)