SutirthaChakraborty
Senior Research Engineer (Xperi.Co), Ph.D (Human-Robot Synchronization for Musical Ensemble, Maynooth University), Music Composer
PhDMaynooth
SutirthaChakraborty's Stars
xtekky/gpt4free
The official gpt4free repository | various collection of powerful language models
labmlai/annotated_deep_learning_paper_implementations
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
spmallick/learnopencv
Learn OpenCV : C++ and Python Examples
PaddlePaddle/PaddleHub
Awesome pre-trained models toolkit based on PaddlePaddle. (400+ models including Image, Text, Audio, Video and Cross-Modal with Easy Inference & Serving)【安全加固,暂停交互,请耐心等待】
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
HVision-NKU/StoryDiffusion
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
meta-llama/llama-models
Utilities intended for use with Llama models.
AILab-CVC/YOLO-World
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
Deci-AI/super-gradients
Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
towhee-io/towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
timsainb/noisereduce
Noise reduction in python using spectral gating (speech, bioacoustics, audio, time-domain signals)
mhamilton723/FeatUp
Official code for "FeatUp: A Model-Agnostic Frameworkfor Features at Any Resolution" ICLR 2024
haoheliu/versatile_audio_super_resolution
Versatile audio super resolution (any -> 48kHz) with AudioSR.
niconielsen32/ComputerVision
av-savchenko/face-emotion-recognition
Efficient face emotion recognition in photos and videos
krantiparida/awesome-audio-visual
A curated list of different papers and datasets in various areas of audio-visual processing
ddlBoJack/emotion2vec
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
TIGER-AI-Lab/AnyV2V
Code and data for "AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks" (TMLR 2024)
jcvasquezc/DisVoice
feature extraction from speech signals
yistLin/dvector
Speaker embedding (d-vector) trained with GE2E loss
lucidrains/lumiere-pytorch
Implementation of Lumiere, SOTA text-to-video generation from Google Deepmind, in Pytorch
FORARTfe/HyMPS
HyMPS will be a platform-indipendent software suite for advanced audio/video contents production.
sunlicai/HiCMAE
[Information Fusion 2024] HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition
BreezeWhite/interesting-colabs
Personal colab collections which I feel interesting.
yochaiye/LipVoicer
Official Code implementation for the ICLR paper "LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading"
ThisIs-Developer/Body-Language-Detection-with-MediaPipe-and-OpenCV
Explore the world of non-verbal communication like never before with our Body Language Detection solution. Utilizing the advanced capabilities of MediaPipe and OpenCV, we provide real-time insights into human gestures, postures, and facial expressions.
haoheliu/nider
Python package to add text to images, textures and different backgrounds