kaka1909's Stars
PlexPt/awesome-chatgpt-prompts-zh
ChatGPT 中文调教指南。各种场景使用指南。学习怎么让它听你的话。
ageitgey/face_recognition
The world's simplest facial recognition api for Python and the command line
ultralytics/yolov5
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
LAION-AI/Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
upscayl/upscayl
🆙 Upscayl - #1 Free and Open Source AI Image Upscaler for Linux, MacOS and Windows.
facebookresearch/detectron2
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
iperov/DeepFaceLive
Real-time face swap for PC streaming or video calls
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
jina-ai/clip-as-service
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
kkroening/ffmpeg-python
Python bindings for FFmpeg - with complex filtering support
AIGC-Audio/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
BloopAI/bloop
bloop is a fast code search engine written in Rust.
PeterL1n/RobustVideoMatting
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!
deep-floyd/IF
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
MahmoudAshraf97/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
showlab/Tune-A-Video
[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
xtekky/chatgpt-clone
ChatGPT interface with better UI
SCUTlihaoyu/open-chat-video-editor
Open source short video automatic generation tool
blmoistawinde/HarvestText
文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法
saffsd/langid.py
Stand-alone language identification system
joonson/syncnet_python
Out of time: automated lip sync in the wild
TaoRuijie/TalkNet-ASD
ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'
keplerlab/katna
Tool for automating common video key-frame extraction, video compression and Image Auto-crop/Image-resize tasks
Junhua-Liao/Light-ASD
The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)
awslabs/aws-media-replay-engine
Media Replay Engine (MRE) is a framework to build automated video clipping and replay (highlight) generation pipelines for live and video-on-demand content.