ttkrpink's Stars
linexjlin/GPTs
leaked prompts of GPTs
Cinnamon/kotaemon
An open-source RAG-based tool for chatting with your documents.
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
stackblitz/bolt.new
Prompt, run, edit, and deploy full-stack web applications
Huanshere/VideoLingo
Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组
Vaibhavs10/insanely-fast-whisper
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
opendatalab/PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
lipku/LiveTalking
Real time interactive streaming digital human
MahmoudAshraf97/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
huggingface/speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
jianchang512/stt
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
naver/mast3r
Grounding Image Matching in 3D with MASt3R
tomasonjo/blogs
Jupyter notebooks that support my graph data science blog posts at https://bratanic-tomaz.medium.com/
SamurAIGPT/AI-Youtube-Shorts-Generator
A python tool that uses GPT-4, FFmpeg, and OpenCV to automatically analyze videos, extract the most interesting sections, and crop them for an improved viewing experience.
harry0703/AudioNotes
快速提取音视频内容,整理成一份结构化的markdown笔记
juanmc2005/diart
A python package to build AI-powered real-time audio applications
Vision-CAIR/MiniGPT4-video
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
zju3dv/GVHMR
Code for "GVHMR: World-Grounded Human Motion Recovery via Gravity-View Coordinates", Siggraph Asia 2024
supabase-community/babelfish.ai
A realtime live transcription and translation app built with Huggingface Transformer.js and Supabase Realtime.
MixedRealityToolkit/MixedRealityToolkit-Unity
This repository holds the third generation of the Mixed Reality Toolkit for Unity. The latest version of the MRTK can be found here.
abetusk/dev
dev log
jim60105/docker-whisperX
Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker Diarization (Dockerfile, CI image build and test)
steinathan/reelsmaker
ReelsMaker is a Python-based/streamlit application designed to create captivating faceless videos for social media platforms like TikTok and YouTube.
nianticlabs/doubletake
[ECCV 2024] DoubleTake: Geometry Guided Depth Estimation
Relsoul/whisper-win-gui
基于whisper的实时语音识别 网页和桌面客户端
facebookresearch/efm3d
This is the official release for the paper "EFM3D A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models" (https//arxiv.org/abs/2406.10224).
KeKsBoTer/cinematic-gaussians
Code for our paper "Application of 3D Gaussian Splatting for Cinematic Anatomy on Consumer Class Devices"