/infra_gpu_hack

A novel algorithm that integrates a text, diffusion LLM as a draft model to boost the performance of traditional auto-regressive LLMs.

Primary LanguagePython

DELTA - Diffusive Extrapolative Language Text Algorithm

A novel algorithm that integrates a text, diffusion LLM as a draft model to boost the performance of traditional auto-regressive LLMs. Try it now

Built for the 2025 Mercor x Cognition x Etched Hackathon image

Memory Optimization for LLaMA-LLaDA Distillation

We've implemented two memory optimization strategies for the LLaMA-LLaDA distillation process to address memory constraints when working with large language models:

  1. Simplified Strategy (train_simple_strat.py): Focuses on the most essential memory optimizations:

    • Gradient checkpointing for the student model
    • No gradient computation for the teacher model
    • Mixed precision training (bfloat16)
    • Periodic CUDA cache clearing
  2. Comprehensive Strategy (train_all_strat.py): Implements all optimizations from the simplified strategy plus:

    • Data pre-tokenization
    • Gradient accumulation
    • Advanced optimizer configuration
    • Learning rate scheduling

For detailed information on these optimizations, see OPTIMIZATION.md.

Project Structure

  • combine_datasets.py: This script loads and combines datasets from different sources, ensuring all columns are present in each dataset. The final dataset is saved as a Parquet file.

  • scripts/: Contains various scripts for dataset handling and model evaluation:

    • custom_dataset.py: Custom dataset handling.
    • download_dataset.py: Script to download datasets.
    • evaluate_direct.py: Direct evaluation of models.
    • evaluate_speculative.py: Speculative evaluation of models.
    • fine_tune.py: Script for fine-tuning models.
    • generate.py: Script to generate outputs from models.
    • speculative_decoding.py: Script for speculative decoding.

Requirements

The project requires Python and several dependencies listed in requirements.txt. To install them, use:

pip install -r requirements.txt

Usage

  1. Combine Datasets: Run combine_datasets.py to load, process, and save a combined dataset.

    python combine_datasets.py
  2. Scripts: Use the scripts in the scripts/ directory for specific tasks like downloading datasets, evaluating models, fine-tuning, and generating outputs.

Script Usages

  • custom_dataset.py: Defines a custom dataset class for loading data from a directory where each entry is stored as a JSON file.

    from custom_dataset import get_dataloader
    
    dataloader = get_dataloader('path/to/dataset', batch_size=8, shuffle=True)
  • download_dataset.py: Downloads a dataset from Hugging Face and saves each entry under a directory named after the dataset.

    python download_dataset.py --dataset_name <dataset_name> --split <split> --save_dir <save_directory>
  • evaluate_direct.py: Evaluates model performance using direct decoding.

    python evaluate_direct.py --model_name <model_name> --evaluation_dataset <evaluation_dataset> --max_length <max_length>
  • evaluate_speculative.py: Evaluates model performance using speculative decoding with a teacher and student model.

    python evaluate_speculative.py --teacher_model <teacher_model> --student_model <student_model> --evaluation_dataset <evaluation_dataset> --max_length <max_length> --speculative_steps <speculative_steps>
  • fine_tune.py: Fine-tunes a Hugging Face model on a specified dataset.

    python fine_tune.py --model_name <model_name> --dataset_name <dataset_name> --fine_tuned_model_name <fine_tuned_model_name> --batch_size <batch_size> --learning_rate <learning_rate> --num_train_epochs <num_train_epochs> --max_length <max_length> --checkpoint <checkpoint>
  • generate.py: Generates model outputs based on a specified dataset and configuration.

    python generate.py --model_name <model_name> --dataset_name <dataset_name> --batch_size <batch_size> --config <config> --max_length <max_length>
  • speculative_decoding.py: Performs speculative decoding using a teacher and student model.

    from speculative_decoding import speculative_generate
    
    output = speculative_generate(teacher_model, student_model, teacher_tokenizer, student_tokenizer, input_text, max_length=50, speculative_steps=3)

Dependencies

The project relies on various Python packages, including but not limited to:

  • datasets
  • pandas
  • torch
  • transformers

For a full list of dependencies, refer to the requirements.txt file.

License

This project is licensed under the MIT License.