svjack

Pinned Repositories

CodeActAgent-Gradio
UnOfficial Gradio Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, Heng Ji.
Language:Jupyter Notebook15 1 01
ControlLoRA-Chinese
A Light Neural Network To Control Stable Diffusion Spatial Information tuned by Chinese
Language:Python9 1 00
docvqa-gen
Question Answering dataset generator of Document Visual in English and Chinese
Language:Jupyter Notebook25 2 22
Genshin-Impact-BookQA-LLM
A Genshin Impact Book Question Answer Project supported by LLM
Language:Python6 1 01
Genshin-Impact-Character-Instruction
Genshin Impact Character Instruction Models tuned by Lora on LLM
Language:Python4 1 01
Genshin-Impact-Fan-Video
一个《原神》AI驱动视频项目，利用LLM API生成角色互动文案，VITS技术进行语音合成，并结合先进的文生图和视频合成技术，创造出游戏角色之间有趣的场景。最终产出为短视频。
Language:Python17 1 04
PhotoWCT
Unofficial implementation of "A Closed-form Solution to Photorealistic Image Stylization"
Language:Python13 1 01
Sbert-ChineseExample
Sentence-Transformers Information Retrieval example on Chinese
Language:Python30 1 06
Stable-Diffusion-Chinese-Extend
A fine tune version of Stable Diffusion model on self-translate 10k diffusiondb Chinese Corpus and "extend" it
Language:Python32 2 15
Stable-Diffusion-Pokemon
A demo of fine tune Stable Diffusion on Pokemon-Blip-Captions in English, Japanese and Chinese Corpus
Language:Python38 1 04

svjack's Repositories

svjack/index-tts-vllm
Added vLLM support to IndexTTS for faster inference.
Language:Python1
svjack/musubi-tuner
Language:Python1
svjack/VideoModelStudio
Gradio webapp to train AI Video models using Finetrainers
Language:Python1
svjack/ai-toolkit
The ultimate training toolkit for finetuning diffusion models
Language:Python
svjack/bilibili-api
哔哩哔哩常用API调用。支持视频、番剧、用户、频道、音频等功能。原仓库地址：https://github.com/MoyuScript/bilibili-api
Language:Python0 0
svjack/ComfyUI-AdvancedLivePortrait
Language:Python
svjack/Comfyui-Deepseek
About DeepSeek Chat API
Language:Python
svjack/ComfyUI-FramePackWrapper_PlusOne
Language:Python0 0
svjack/ComfyUI-HunyuanVideoWrapper
Language:Python0 0
svjack/ComfyUI-LatentSyncWrapper
This node provides lip-sync capabilities in ComfyUI using ByteDance's LatentSync model. It allows you to synchronize video lips with audio input.
Language:Python
svjack/ComfyUI-MiniCPM
A custom ComfyUI node for MiniCPM vision-language models, supporting v4, v4.5, and v4 GGUF formats, enabling high-quality image captioning and visual analysis.
Language:Python
svjack/ComfyUI-MMAudio
Language:Python
svjack/ComfyUI_Qwen2_5-VL-Instruct
The successful integration of Qwen2.5-VL-Instruct series into the ComfyUI platform has enabled a smooth operation, supporting (but not limited to) text-based queries, video queries, single-image queries, and multi-image queries for generating captions or responses.
Language:Python
svjack/ComfyUI_RH_DreamOmni2
A ComfyUI node for dvlab-research/DreamOmni2
Language:Python
svjack/ComfyUI_RH_Ovi
ComfyUI custom nodes for Ovi joint video+audio generation
Language:Python
svjack/ComfyUI_RH_VideoAsPrompt
This is a VideoAsPrompt ComfyUI plugin
Language:Python
svjack/HoloTime
HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation
Language:Python
svjack/imgutils
A convenient and user-friendly anime-style image data processing library that integrates various advanced anime-style image processing models
Language:Python0 0
svjack/InfiniteTalk
Unlimited-length talking video generation that supports image-to-video and video-to-video generation
Language:Python
svjack/InfiniteYou
🔥 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Language:Python
svjack/Keye
Language:Python
svjack/LLaVA-NeXT
Language:Python0 0
svjack/PosterCraft
Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
Language:Python
svjack/Qwen2.5-Omni
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Language:Jupyter Notebook
svjack/SoulX-Podcast
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.
Language:Python
svjack/Stand-In_Preprocessor_ComfyUI
The core component of Stand-In, the preprocessor, is essential—only images processed through it can fully unlock the capabilities of Stand-In.
Language:Python0 0
svjack/Step-Audio
Language:Python0 0
svjack/Step-Audio2
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
Language:Python0 0
svjack/twitter-media-downloader
twmd: CLI/GUI Apiless twitter downlaoder. Download medias from single tweet or a whole profile.
Language:Go
svjack/Wan2GP
Wan 2.1 for the GPU Poor
Language:Python0 0