Jerrisk's Stars
Tencent/HunyuanVideo
HunyuanVideo: A Systematic Framework For Large Video Generation Model
compphoto/IntrinsicCompositing
Code for the SIGGRAPH Asia 2023 paper "Intrinsic Harmonization for Illumination-Aware Compositing"
Stable-X/StableNormal
[SIGGRAPH Asia 2024 (Journal Track)] StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal
Kwai-Kolors/MPS
gemelo-ai/vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
espeak-ng/espeak-ng
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
numediart/MBROLA
MBROLA is a speech synthesizer based on the concatenation of diphones
GitYCC/g2pW
Chinese Mandarin Grapheme-to-Phoneme Converter. 中文轉注音或拼音 (INTERSPEECH 2022)
MCG-NKU/AMT
Official code for "AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation" (CVPR2023)
baowenbo/DAIN
Depth-Aware Video Frame Interpolation (CVPR 2019)
Yuan-ManX/SouPyX
SouPyX: An Audio Exploration Space.🪐
InternLM/MindSearch
🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)
modelscope/DiffSynth-Studio
Enjoy the magic of Diffusion models!
archinetai/audio-ai-timeline
A timeline of the latest AI models for audio generation, starting in 2023!
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
state-spaces/s4
Structured state space sequence models
vercel/ai
Build AI-powered applications with React, Svelte, Vue, and Solid
2noise/ChatTTS
A generative speech model for daily dialogue.
antiboredom/camera-motion-detector
Uses opencv to detect when a camera is panning or zooming.
bytedance/particle-sfm
ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild. ECCV 2022.
discus0434/aesthetic-predictor-v2-5
SigLIP-based Aesthetic Score Predictor
DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
abi/screenshot-to-code
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
brentyi/tyro
CLI interfaces & config objects, from types
meta-llama/PurpleLlama
Set of tools to assess and improve LLM security.
meta-llama/llama3
The official Meta Llama 3 GitHub site
ActiveState/appdirs
A small Python module for determining appropriate platform-specific dirs, e.g. a "user data dir".
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
ttengwang/Caption-Anything
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything