Pinned Repositories
CodeActAgent-Gradio
UnOfficial Gradio Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, Heng Ji.
ControlLoRA-Chinese
A Light Neural Network To Control Stable Diffusion Spatial Information tuned by Chinese
docvqa-gen
Question Answering dataset generator of Document Visual in English and Chinese
Genshin-Impact-BookQA-LLM
A Genshin Impact Book Question Answer Project supported by LLM
Genshin-Impact-Character-Instruction
Genshin Impact Character Instruction Models tuned by Lora on LLM
Genshin-Impact-Fan-Video
一个《原神》AI驱动视频项目,利用LLM API生成角色互动文案,VITS技术进行语音合成,并结合先进的文生图和视频合成技术,创造出游戏角色之间有趣的场景。最终产出为短视频。
PhotoWCT
Unofficial implementation of "A Closed-form Solution to Photorealistic Image Stylization"
Sbert-ChineseExample
Sentence-Transformers Information Retrieval example on Chinese
Stable-Diffusion-Chinese-Extend
A fine tune version of Stable Diffusion model on self-translate 10k diffusiondb Chinese Corpus and "extend" it
Stable-Diffusion-Pokemon
A demo of fine tune Stable Diffusion on Pokemon-Blip-Captions in English, Japanese and Chinese Corpus
svjack's Repositories
svjack/VideoModelStudio
Gradio webapp to train AI Video models using Finetrainers
svjack/ai-toolkit
The ultimate training toolkit for finetuning diffusion models
svjack/bilibili-api
哔哩哔哩常用API调用。支持视频、番剧、用户、频道、音频等功能。原仓库地址:https://github.com/MoyuScript/bilibili-api
svjack/Comfyui-Deepseek
About DeepSeek Chat API
svjack/ComfyUI-FramePackWrapper_PlusOne
svjack/ComfyUI-HunyuanVideoWrapper
svjack/ComfyUI-LatentSyncWrapper
This node provides lip-sync capabilities in ComfyUI using ByteDance's LatentSync model. It allows you to synchronize video lips with audio input.
svjack/ComfyUI-MiniCPM
A custom ComfyUI node for MiniCPM vision-language models, supporting v4, v4.5, and v4 GGUF formats, enabling high-quality image captioning and visual analysis.
svjack/ComfyUI-MMAudio
svjack/ComfyUI_Qwen2_5-VL-Instruct
The successful integration of Qwen2.5-VL-Instruct series into the ComfyUI platform has enabled a smooth operation, supporting (but not limited to) text-based queries, video queries, single-image queries, and multi-image queries for generating captions or responses.
svjack/float
Official Pytorch Implementation of FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait.
svjack/HoloTime
HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation
svjack/imgutils
A convenient and user-friendly anime-style image data processing library that integrates various advanced anime-style image processing models
svjack/InfiniteTalk
Unlimited-length talking video generation that supports image-to-video and video-to-video generation
svjack/InfiniteYou
🔥 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
svjack/InstantCharacter
svjack/Keye
svjack/LLaVA-NeXT
svjack/musubi-tuner
svjack/PosterCraft
Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
svjack/Qwen2.5-Omni
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
svjack/Self-Forcing
svjack/SkyReels-V2
SkyReels-V2: Infinite-length Film Generative model
svjack/Spark-TTS
Spark-TTS Inference Code
svjack/Stand-In_Preprocessor_ComfyUI
The core component of Stand-In, the preprocessor, is essential—only images processed through it can fully unlock the capabilities of Stand-In.
svjack/Step-Audio
svjack/Step-Audio2
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
svjack/Step1X-Edit
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.
svjack/twitter-media-downloader
twmd: CLI/GUI Apiless twitter downlaoder. Download medias from single tweet or a whole profile.
svjack/Wan2GP
Wan 2.1 for the GPU Poor