Pinned Repositories
AutoStudio
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
bitsandbytes
8-bit CUDA functions for PyTorch
bounded-attention
Bunny
A family of lightweight multimodal models.
cog-video-morpher
Generate a video that morphs between subjects, with an optional style
LISAKaggle
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
MplugOwl
PersonalROS
Personal stuff for robots
Project
Simple repository for personal project
sussy
Code for subgoal synthesis via image editing
johnwick123f's Repositories
johnwick123f/PersonalROS
Personal stuff for robots
johnwick123f/Project
Simple repository for personal project
johnwick123f/AutoStudio
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
johnwick123f/bounded-attention
johnwick123f/Bunny
A family of lightweight multimodal models.
johnwick123f/cog-video-morpher
Generate a video that morphs between subjects, with an optional style
johnwick123f/coqui-ai-TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
johnwick123f/DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
johnwick123f/sussy
Code for subgoal synthesis via image editing
johnwick123f/fish-speech
Brand new TTS solution
johnwick123f/GLEE
GLEE: General Object Foundation Model for Images and Videos at Scale
johnwick123f/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
johnwick123f/gpt_sovits_python
Python wrapper for fast inference with GPT-SoVITS
johnwick123f/Grasp-Anything
Dataset and Code for "Grasp-Anything: Large-scale Grasp Dataset from Foundation Models."
johnwick123f/graspnetAPI
Toolbox for our GraspNet-1Billion dataset.
johnwick123f/GroundingDINO
Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
johnwick123f/groundingLMM
Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
johnwick123f/llama-cpp-python
Python bindings for llama.cpp
johnwick123f/LLaMA2-Accessory
An Open-source Toolkit for LLM Development
johnwick123f/MoE-LLaVA
Mixture-of-Experts for Large Vision-Language Models
johnwick123f/multi_token
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
johnwick123f/piecewise-rectified-flow
perflow but library
johnwick123f/resemble-enhance
AI powered speech denoising and enhancement
johnwick123f/rich-text-to-image
Rich-Text-to-Image Generation
johnwick123f/RPG-DiffusionMaster
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)
johnwick123f/StreamMultiDiffusion
Official code for the paper "StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control."
johnwick123f/text-generation-webui
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, llama.cpp (GGUF), Llama models.
johnwick123f/tokenize-anything
Tokenize Anything via Prompting
johnwick123f/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
johnwick123f/videollm-online
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)