NeuroSandboxWebUI: A Python repository from Dartvauder

Features | Dependencies | SystemRequirements | Install | Wiki | Acknowledgment | Licenses

Work in progress but stable!
English | Русский | 漢語

Description:

A simple and convenient interface for using various neural network models. You can communicate with LLM using text, voice and image input; use StableDiffusion, Kandinsky, Flux, HunyuanDiT, Lumina-T2X, Kolors, AuraFlow, Würstchen, DeepFloydIF, PixArt, CogView3-Plus and PlaygroundV2.5, to generate images; ModelScope, ZeroScope 2, CogVideoX and Latte to generate videos; StableFast3D, Shap-E, and Zero123Plus to generate 3D objects; StableAudioOpen, AudioCraft and AudioLDM 2 to generate music and audio; CoquiTTS, MMS and SunoBark for text-to-speech; OpenAI-Whisper and MMS for speech-to-text; Wav2Lip for lip-sync; LivePortrait for animate an image; Roop to faceswap; Rembg to remove background; CodeFormer for face restore; PixelOE for image pixelization; DDColor for image colorization; LibreTranslate and SeamlessM4Tv2 for text translation; Demucs and UVR for audio file separation; RVC for voice conversion. You can also view files from the outputs directory in gallery, download the LLM and StableDiffusion models, change the application settings inside the interface and check system sensors

The goal of the project - to create the easiest possible application to use neural network models

Text:

Image:

Video:

3D:

Audio:

Extras:

Interface:

Features:

Easy installation via install.bat (Windows) or install.sh (Linux & MacOS)
You can use the application via your mobile device in localhost (Via IPv4) or anywhere online (Via Share)
Flexible and optimized interface (By Gradio)
Debug logging to logs from Install and Update files
Available in three languages
Support for Transformers, BNB, GPTQ, AWQ, ExLlamaV2 and llama.cpp models (LLM)
Support for diffusers and safetensors models (StableDiffusion) - txt2img, img2img, depth2img, marigold, pix2pix, controlnet, upscale (latent), refiner, inpaint, outpaint, gligen, diffedit, blip-diffusion, animatediff, hotshot-xl, video, ldm3d, sd3, cascade, t2i-ip-adapter, ip-adapter-faceid and riffusion tabs
Support for stable-diffusion-cpp models for FLUX and Stable Diffusion
Support of additional models for image generation: Kandinsky (txt2img, img2img, inpaint), Flux (txt2img with cpp quantize and LoRA support, img2img, inpaint, controlnet), HunyuanDiT (txt2img, controlnet), Lumina-T2X, Kolors (txt2img with LoRA support, img2img, ip-adapter-plus), AuraFlow (with LoRA and AuraSR support), Würstchen, DeepFloydIF (txt2img, img2img, inpaint), PixArt, CogView3-Plus and PlaygroundV2.5
Support Extras with Rembg, CodeFormer, PixelOE, DDColor, DownScale, Format changer, FaceSwap (Roop) and Upscale (Real-ESRGAN) models for image, video and audio
Support StableAudio
Support AudioCraft (Models: musicgen, audiogen and magnet)
Support AudioLDM 2 (Models: audio and music)
Supports TTS and Whisper models (For LLM and TTS-STT)
Support MMS for text-to-speech and speech-to-text
Supports Lora, Textual inversion (embedding), Vae, MagicPrompt, Img2img, Depth, Marigold, Pix2Pix, Controlnet, Upscale (latent), Refiner, Inpaint, Outpaint, GLIGEN, DiffEdit, BLIP-Diffusion, AnimateDiff, HotShot-XL, Videos, LDM3D, SD3, Cascade, T2I-IP-ADAPTER, IP-Adapter-FaceID and Riffusion models (For StableDiffusion)
Support Multiband Diffusion model (For AudioCraft)
Support LibreTranslate (Local API) and SeamlessM4Tv2 for language translations
Support ModelScope, ZeroScope 2, CogVideoX and Latte for video generation
Support SunoBark
Support Demucs and UVR for audio file separation
Support RVC for voice conversion
Support StableFast3D, Shap-E and Zero123Plus for 3D generation
Support Wav2Lip
Support LivePortrait for animate an image
Support Multimodal (Moondream 2, LLaVA-NeXT-Video, Qwen2-Audio), PDF-Parsing (OpenParse), TTS (CoquiTTS), STT (Whisper), LORA and WebSearch (with DuckDuckGo) for LLM
MetaData-Info viewer for generating image, video and audio
Model settings inside the interface
Online and offline Wiki
Gallery
ModelDownloader
Application settings
Ability to see system sensors

Required Dependencies:

Python (3.10.11)
Git
Only for GPU version: CUDA (12.4) and cuDNN (9.1)
FFMPEG

C+ compiler
- Windows: VisualStudio, VisualStudioCode and Cmake
- Linux: GCC, VisualStudioCode and Cmake

Minimum System Requirements:

System: Windows, Linux or MacOS
GPU: 6GB+ or CPU: 8 core 3.6GHZ
RAM: 16GB+
Disk space: 20GB+
Internet for downloading models and installing

How to install:

Windows

First install all RequiredDependencies
Git clone https://github.com/Dartvauder/NeuroSandboxWebUI.git to any location
Run the Install.bat, select your version and wait for installation
After installation, run Start.bat and go through the initial setup
Wait for the application to launch and follow the link from the terminal
Now you can start generating. Enjoy!

To get update, run Update.bat
To work with the virtual environment through the terminal, run Venv.bat

Linux & MacOS

First install all RequiredDependencies
Git clone https://github.com/Dartvauder/NeuroSandboxWebUI.git to any location
Run the ./Install.sh, select your version and wait for installation
After installation, run ./Start.sh and go through the initial setup
Wait for the application to launch and follow the link from the terminal
Now you can start generating. Enjoy!

To get update, run ./Update.sh
To work with the virtual environment through the terminal, run ./Venv.sh

Wiki

https://github.com/Dartvauder/NeuroSandboxWebUI/wiki/EN‐Wiki

Acknowledgment to developers

Many thanks to these projects because thanks to their applications/libraries, i was able to create my application:

First of all, I want to thank the developers of PyCharm and GitHub. With the help of their applications, i was able to create and share my code

gradio - https://github.com/gradio-app/gradio
transformers - https://github.com/huggingface/transformers
auto-gptq - https://github.com/AutoGPTQ/AutoGPTQ
autoawq - https://github.com/casper-hansen/AutoAWQ
exllamav2 - https://github.com/turboderp/exllamav2
coqui-tts - https://github.com/idiap/coqui-ai-TTS
openai-whisper - https://github.com/openai/whisper
torch - https://github.com/pytorch/pytorch
cuda-python - https://github.com/NVIDIA/cuda-python
gitpython - https://github.com/gitpython-developers/GitPython
diffusers - https://github.com/huggingface/diffusers
llama.cpp-python - https://github.com/abetlen/llama-cpp-python
stable-diffusion-cpp-python - https://github.com/william-murray1204/stable-diffusion-cpp-python
audiocraft - https://github.com/facebookresearch/audiocraft
xformers - https://github.com/facebookresearch/xformers
demucs - https://github.com/facebookresearch/demucs
libretranslatepy - https://github.com/argosopentech/LibreTranslate-py
rembg - https://github.com/danielgatis/rembg
suno-bark - https://github.com/suno-ai/bark
IP-Adapter - https://github.com/tencent-ailab/IP-Adapter
PyNanoInstantMeshes - https://github.com/vork/PyNanoInstantMeshes
CLIP - https://github.com/openai/CLIP
rvc-python - https://github.com/daswer123/rvc-python
audio-separator - https://github.com/nomadkaraoke/python-audio-separator
pixeloe - https://github.com/KohakuBlueleaf/PixelOE
k-diffusion - https://github.com/crowsonkb/k-diffusion
open-parse - https://github.com/Filimoa/open-parse
AudioSR - https://github.com/haoheliu/versatile_audio_super_resolution
sd_embed - https://github.com/xhinker/sd_embed
triton - https://github.com/triton-lang/triton/

Dartvauder/NeuroSandboxWebUI