iskaj's Stars
mindee/doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
THU-MIG/yolov10
YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
eduardzamfir/seemoredetails
Repository for "See More Details: Efficient Image Super-Resolution by Experts Mining", ICML 2024
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
modelscope/facechain
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
AIGCDesignGroup/ReplaceAnything
huggingface/distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
s3prl/s3prl
Self-Supervised Speech Pre-training and Representation Learning Toolkit
resemble-ai/Resemblyzer
A python package to analyze and compare voices with deep learning
C3Imaging/child_tts_fastpitch
Fastpitch text-to-speech (TTS) model for generating high-quality synthetic child speech. This study uses the transfer learning training pipeline. The approach involved finetuning a multi-speaker TTS model to work with child speech. We use the publicly available MyST dataset (55 hours) for our finetuning experiments.
Rikorose/DeepFilterNet
Noise supression using deep filtering
GXYM/TextBPN-Plus-Plus
Arbitrary Shape Text Detection via Boundary Transformer;The paper at: https://arxiv.org/abs/2205.05320, which has been accepted by IEEE Transactions on Multimedia (T-MM 2023).
gabriben/awesome-generative-information-retrieval
jianfch/stable-ts
Transcription, forced alignment, and audio indexing with OpenAI's Whisper
SubtitleEdit/subtitleedit-cli
Subtitle Edit cli (without System.Drawing)
kennethleungty/Failed-ML
Compilation of high-profile real-world examples of failed machine learning projects
diff-usion/Awesome-Diffusion-Models
A collection of resources and papers on Diffusion Models
laurensw75/Words2Num_nl
Convert spelled out numbers in Dutch to numeric form
jonatasgrosman/huggingsound
HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools
rmcelreath/stat_rethinking_2022
Statistical Rethinking course winter 2022
jonatasgrosman/asrecognition
ASRecognition: just an easy-to-use library for Automatic Speech Recognition.
RameenAbdal/StyleFlow
StyleFlow: Attribute-conditioned Exploration of StyleGAN-generated Images using Conditional Continuous Normalizing Flows (ACM TOG 2021)
iskaj/Newsgram
Airconsole Game for Education on News Literacy
advimman/HiDT
Official repository for the paper "High-Resolution Daytime Translation Without Domain Labels" (CVPR2020, Oral)
YolandaDuan/AVI_LeapMotion_Exergaming