/hyper-merge

Primary LanguagePythonMIT LicenseMIT

Hyper-Merge: Advanced Weight Merging for Stable-Diffusion

Introduction

Welcome to Hyper-Merge: your toolkit for next-level neural model merging. We introduce an algorithm to distill multiple models into a single, compact "Hyper-Model," supplemented by a set of low-rank approximations (LoRAs). Wondering how it works? Let's dive in.

Imagine you have multiple models and you want to condense them into a single representation — saving storage space without sacrificing performance. That's where Hyper-Merge shines.

In a simplified mathematical framework, Hyper-Merge proposes the following formula to express the unified model:

$$ \text{model} = \text{hyper-model} + \sum_i \lambda_i \text{LoRA}_i $$

Breaking it Down

  • Hyper-Model: This is essentially an average of all the individual models you intend to merge. Think of it as the 'base' on which we build our optimized model.

  • LoRAs (Low-Rank Approximations): Each LoRA serves as a directional vector in a high-dimensional weight space, capturing the deviation of individual models from the Hyper-Model. This way, we can approximate the actual models with fewer parameters.

While traditional approaches may use a single large LoRA (say, rank-256) to approximate the models, Hyper-Merge can intelligently utilize multiple smaller LoRAs (e.g., four rank-64 LoRAs), thus further reducing the space requirement.

The Magic of Hyper-Merge

So, what sets Hyper-Merge apart? The algorithm doesn't just stop at creating one LoRA from a base and a fine-tuned model. Imagine having $M$ different models; Hyper-Merge constructs a multi-dimensional pathway (LoRAs) that collectively approximates these models. The upshot? Instead of juggling a hefty 20GB for ten 2GB models, you end up with just the 2GB Hyper-Model and some smaller LoRAs that together offer a stunning approximation. Talk about efficiency!

Quick Start: Zero Installation Required! 🚀

Ready to jump in? Awesome, because you won't need to install any extra software to get started. Follow these streamlined steps to launch your project:

Step 1: Clone the Repository

git clone https://github.com/tfernd/hyper-merge
cd hyper-merge

Step 2: Activate Your Virtual Environment

Replace PATH-TO-WEBUI with the actual path where your venv directory is (your AUTOMATIC1111 folder).

source PATH-TO-WEBUI/venv/scripts/activate

Step 3: Install Dependencies

Rest assured, this won't interfere with any existing packages on your system. It does not verify versions!

pip install -r requirements.txt

Step 4: Create Configuration File

Navigate to the config folder and create your own YAML configuration file. You can use example.yaml as a reference. Here's a sample structure:

name: my-awesome-model # Name your model as you like

models:
  # Specify paths to your SD 1.5 models in SAFETENSORS format
  - C:\path-to-your-SD-1.5-model.safetensors

device: cuda # Use 'cuda' for GPU acceleration, 'cpu' if not recommended
dtype: float16 # Choose 'float16' for performance, 'bfloat16' for RTX 3xxx/4xxx series, or 'float32' for compatibility

iterations: 6 # Set the number of optimization iterations; greater than 2 recommended
ranks: # Define the ranks for the LoRA model
  - 128
  - 64
  - 32
  - 16

Step 5: Execute the Code

Finally, run the code by specifying your configuration file path.

clear; python ./merge.py --config ./config/your-config.yaml

And that's it! You're all set to take your project to new heights. 🎉


Comparative Analysis: Hyper-Model with LoRAs vs. Original Checkpoint

In this section, we provide a visual comparison to demonstrate the capabilities of the hyper-model augmented with Layers of Rank-Adjusted tensors (LoRAs) against the original checkpoint model. The goal is to evaluate the effectiveness of utilizing LoRAs for different tasks. Both models were tested using identical prompts, seeds, and parameters.

Visualization

The top row showcases images generated by the hyper-model with LoRAs, while the bottom row presents those produced by the original checkpoint.

Comparative Analysis

LoRA Configurations

In this example, LoRAs with ranks 128, 64, 64, 32, 32, 16, and 16 were used. Please note that the optimal rank configurations are still under investigation, and further studies will help in determining the most effective ranks for specific applications.

This visual representation aims to give you a clearer understanding of the advantages and potential of integrating LoRAs into your hyper-model. Stay tuned for more in-depth analyses and updates!

Upcoming Enhancements 📝

We've identified key areas to focus on for improving our hyper-model. Here's a streamlined checklist:

  • Evaluate Model Loss: Calculate loss metrics for individual models.
  • LoRA Approximation: Extract LoRA multipliers for non-hyper-merged models to test generalizability.
  • Concurrent LoRA Optimization: Aim to optimize multiple LoRAs simultaneously for increased efficiency.

Each task targets a specific aspect of our project, promising significant improvements. Stay tuned!