Pinned Repositories
CodeActAgent-Gradio
UnOfficial Gradio Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, Heng Ji.
ControlLoRA-Chinese
A Light Neural Network To Control Stable Diffusion Spatial Information tuned by Chinese
docvqa-gen
Question Answering dataset generator of Document Visual in English and Chinese
Genshin-Impact-BookQA-LLM
A Genshin Impact Book Question Answer Project supported by LLM
Genshin-Impact-Character-Instruction
Genshin Impact Character Instruction Models tuned by Lora on LLM
Genshin-Impact-Fan-Video
一个《原神》AI驱动视频项目,利用LLM API生成角色互动文案,VITS技术进行语音合成,并结合先进的文生图和视频合成技术,创造出游戏角色之间有趣的场景。最终产出为短视频。
PhotoWCT
Unofficial implementation of "A Closed-form Solution to Photorealistic Image Stylization"
Sbert-ChineseExample
Sentence-Transformers Information Retrieval example on Chinese
Stable-Diffusion-Chinese-Extend
A fine tune version of Stable Diffusion model on self-translate 10k diffusiondb Chinese Corpus and "extend" it
Stable-Diffusion-Pokemon
A demo of fine tune Stable Diffusion on Pokemon-Blip-Captions in English, Japanese and Chinese Corpus
svjack's Repositories
svjack/index-tts-vllm
Added vLLM support to IndexTTS for faster inference.
svjack/musubi-tuner
svjack/VideoModelStudio
Gradio webapp to train AI Video models using Finetrainers
svjack/ai-toolkit
The ultimate training toolkit for finetuning diffusion models
svjack/bilibili-api
哔哩哔哩常用API调用。支持视频、番剧、用户、频道、音频等功能。原仓库地址:https://github.com/MoyuScript/bilibili-api
svjack/ComfyUI-AdvancedLivePortrait
svjack/Comfyui-Deepseek
About DeepSeek Chat API
svjack/ComfyUI-FramePackWrapper_PlusOne
svjack/ComfyUI-HunyuanVideoWrapper
svjack/ComfyUI-LatentSyncWrapper
This node provides lip-sync capabilities in ComfyUI using ByteDance's LatentSync model. It allows you to synchronize video lips with audio input.
svjack/ComfyUI-MiniCPM
A custom ComfyUI node for MiniCPM vision-language models, supporting v4, v4.5, and v4 GGUF formats, enabling high-quality image captioning and visual analysis.
svjack/ComfyUI-MMAudio
svjack/ComfyUI_Qwen2_5-VL-Instruct
The successful integration of Qwen2.5-VL-Instruct series into the ComfyUI platform has enabled a smooth operation, supporting (but not limited to) text-based queries, video queries, single-image queries, and multi-image queries for generating captions or responses.
svjack/ComfyUI_RH_DreamOmni2
A ComfyUI node for dvlab-research/DreamOmni2
svjack/ComfyUI_RH_Ovi
ComfyUI custom nodes for Ovi joint video+audio generation
svjack/ComfyUI_RH_VideoAsPrompt
This is a VideoAsPrompt ComfyUI plugin
svjack/HoloTime
HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation
svjack/imgutils
A convenient and user-friendly anime-style image data processing library that integrates various advanced anime-style image processing models
svjack/InfiniteTalk
Unlimited-length talking video generation that supports image-to-video and video-to-video generation
svjack/InfiniteYou
🔥 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
svjack/Keye
svjack/LLaVA-NeXT
svjack/PosterCraft
Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
svjack/Qwen2.5-Omni
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
svjack/SoulX-Podcast
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.
svjack/Stand-In_Preprocessor_ComfyUI
The core component of Stand-In, the preprocessor, is essential—only images processed through it can fully unlock the capabilities of Stand-In.
svjack/Step-Audio
svjack/Step-Audio2
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
svjack/twitter-media-downloader
twmd: CLI/GUI Apiless twitter downlaoder. Download medias from single tweet or a whole profile.
svjack/Wan2GP
Wan 2.1 for the GPU Poor