/mini-tensorflow

Primary LanguageRustMIT LicenseMIT

πŸ¦€ Mini TensorFlow - Deep Learning Library in Rust

A comprehensive, high-performance deep learning library implemented in Rust, inspired by TensorFlow and PyTorch. Built with safety, speed, and ergonomics in mind.

πŸš€ Features

  • πŸ”’ Tensor Operations: Multi-dimensional arrays with broadcasting support
  • 🧠 Neural Networks: Dense, Convolutional, and Activation layers
  • πŸ“Š Computation Graph: Dynamic graph execution with forward pass
  • ⚑ SIMD & Parallel: Vectorized operations and multi-core processing
  • πŸ’Ύ Model Serialization: Save/load models in JSON and binary formats
  • πŸ“ˆ Data Loading: CSV support, synthetic datasets, and batch processing
  • πŸ”§ Optimizers: SGD with momentum and Adam optimizer
  • πŸ›‘οΈ Memory Safety: Zero-cost abstractions with Rust's ownership system

πŸ“‹ Table of Contents

πŸ—οΈ Architecture

System Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Mini TensorFlow Library                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Examples Layer                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Sequential  β”‚ CNN Example β”‚ Data Loadingβ”‚ SIMD Benchmark  β”‚  β”‚
β”‚  β”‚ Models      β”‚ & Training  β”‚ & Batching  β”‚ & Performance   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  High-Level API                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Sequential  β”‚ Layer       β”‚ DataLoader  β”‚ Model           β”‚  β”‚
β”‚  β”‚ Container   β”‚ Abstraction β”‚ & Dataset   β”‚ Serialization   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Core Components                                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Layers      β”‚ Convolution β”‚ Optimizers  β”‚ Autograd        β”‚  β”‚
β”‚  β”‚ (Dense,     β”‚ (Conv2D,    β”‚ (SGD,       β”‚ (Variables,     β”‚  β”‚
β”‚  β”‚ Activations)β”‚ MaxPool2D)  β”‚ Adam)       β”‚ Gradients)      β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Computation Engine                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Tensor      β”‚ Graph       β”‚ SIMD        β”‚ Parallel        β”‚  β”‚
β”‚  β”‚ Operations  β”‚ Execution   β”‚ Operations  β”‚ Computing       β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Foundation                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚            Rust Memory Safety & Performance               β”‚  β”‚
β”‚  β”‚     (Zero-cost abstractions, RAII, Ownership model)      β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow Architecture

Input Data β†’ Tensor β†’ Layer Chain β†’ Output β†’ Loss β†’ Optimizer β†’ Updated Parameters
     β”‚         β”‚         β”‚           β”‚        β”‚         β”‚              β”‚
     β”‚         β”‚         β”‚           β”‚        β”‚         β”‚              β”‚
     β–Ό         β–Ό         β–Ό           β–Ό        β–Ό         β–Ό              β–Ό
  β”Œβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ CSV β”‚ β”‚ Tensor  β”‚ β”‚ Conv2D  β”‚ β”‚ Lossβ”‚ β”‚ SGD/ β”‚ β”‚ Params β”‚ β”‚ Serialized  β”‚
  β”‚Filesβ”‚ β”‚ Ops     β”‚ β”‚ Dense   β”‚ β”‚ Calcβ”‚ β”‚ Adam β”‚ β”‚ Update β”‚ β”‚ Model       β”‚
  β”‚ ... β”‚ β”‚ SIMD    β”‚ β”‚ ReLU    β”‚ β”‚ ... β”‚ β”‚ ... β”‚ β”‚ ...    β”‚ β”‚ JSON/Binary β”‚
  β””β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tensor Computation Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                           Tensor Operations                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          Shape Validation                                  β”‚
β”‚                     (Broadcasting, Dimension checks)                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Regular Ops     β”‚ SIMD Optimized  β”‚ Parallel Ops    β”‚ Specialized Ops     β”‚
β”‚ - Element-wise  β”‚ - f64x4 vectors β”‚ - Rayon threads β”‚ - Matrix multiply   β”‚
β”‚ - Single thread β”‚ - AVX/SSE       β”‚ - Multi-core    β”‚ - Convolution       β”‚
β”‚ - Standard loop β”‚ - 4x throughput β”‚ - Work stealing β”‚ - Activation funcs  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                            Result Tensor                                   β”‚
β”‚                         (New shape, data)                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Neural Network Layer Architecture

Sequential Model Container
β”‚
β”œβ”€β”€ Layer 1: Input Processing
β”‚   β”œβ”€β”€ Conv2D(in_channels=1, out_channels=32, kernel=3x3)
β”‚   β”œβ”€β”€ ReLU activation
β”‚   └── MaxPool2D(kernel=2x2, stride=2)
β”‚
β”œβ”€β”€ Layer 2: Feature Extraction
β”‚   β”œβ”€β”€ Conv2D(in_channels=32, out_channels=64, kernel=3x3)
β”‚   β”œβ”€β”€ ReLU activation
β”‚   └── MaxPool2D(kernel=2x2, stride=2)
β”‚
β”œβ”€β”€ Layer 3: Flattening
β”‚   └── Flatten(4D β†’ 2D conversion)
β”‚
β”œβ”€β”€ Layer 4: Classification Head
β”‚   β”œβ”€β”€ Dense(features_in=1600, features_out=128)
β”‚   β”œβ”€β”€ ReLU activation
β”‚   β”œβ”€β”€ Dense(features_in=128, features_out=10)
β”‚   └── Softmax activation
β”‚
└── Output: Probability Distribution [batch_size, num_classes]

Memory Management Flow

Stack Memory                  Heap Memory                   GPU/SIMD
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ References  β”‚              β”‚ Tensor Data β”‚              β”‚ Vectorized  β”‚
β”‚ &Tensor     │─────────────▢│ Vec<f64>    │─────────────▢│ Operations  β”‚
β”‚ &mut Tensor β”‚              β”‚ Shape info  β”‚              β”‚ f64x4 SIMD  β”‚
β”‚ Temporaries β”‚              β”‚ Gradients   β”‚              β”‚ Parallel    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚                             β”‚                             β”‚
      β–Ό                             β–Ό                             β–Ό
  Zero-copy                    RAII cleanup               Hardware acceleration
  borrowing                   Automatic drop              AVX2/FMA instructions

πŸ“¦ Installation

Prerequisites

  • Rust 1.70+ (2021 edition)
  • Cargo package manager

Quick Setup

# Clone the repository
git clone https://github.com/AarambhDevHub/mini-tensorflow.git
cd mini-tensorflow

# Build the project
cargo build --release

# Run tests
cargo test

# Run examples
cargo run --example sequential_model
cargo run --example cnn_example
cargo run --example data_loading
cargo run --example simd_benchmark

Dependencies

[dependencies]
rand = "0.8"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
bincode = "1.3"
csv = "1.2"
rayon = "1.7"
num-traits = "0.2"

[target.'cfg(target_arch = "x86_64")'.dependencies]
wide = "0.7"  # SIMD operations

πŸš€ Quick Start

Basic Tensor Operations

use mini_tensorflow::Tensor;

// Create tensors
let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0], vec![2, 2]);
let b = Tensor::new(vec![5.0, 6.0, 7.0, 8.0], vec![2, 2]);

// Basic operations
let sum = a.add(&b);                    // Element-wise addition
let product = a.matmul(&b);             // Matrix multiplication
let activated = a.relu();               // ReLU activation

println!("Sum: {}", sum);
println!("Matrix product: {}", product);
println!("ReLU activated: {}", activated);

Building Neural Networks

use mini_tensorflow::{Sequential, Dense, ReLU, Softmax};

// Create a multi-layer perceptron
let model = Sequential::new()
    .add(Dense::new(784, 256))    // Input layer
    .add(ReLU::new())
    .add(Dense::new(256, 128))    // Hidden layer
    .add(ReLU::new())
    .add(Dense::new(128, 10))     // Output layer
    .add(Softmax::new());

// Display model architecture
model.summary();

// Forward pass
let input = Tensor::random(vec![1, 784]);
let output = model.forward(input);
println!("Predictions: {}", output);

Convolutional Neural Networks

use mini_tensorflow::{Sequential, Conv2D, MaxPool2D, Flatten, Dense, ReLU, Softmax};

// Create CNN for image classification
let cnn = Sequential::new()
    .add(Conv2D::new(1, 32, 3))      // 1β†’32 channels, 3Γ—3 kernel
    .add(ReLU::new())
    .add(MaxPool2D::new(2))          // 2Γ—2 pooling
    .add(Conv2D::new(32, 64, 3))     // 32β†’64 channels
    .add(ReLU::new())
    .add(MaxPool2D::new(2))
    .add(Flatten::new())
    .add(Dense::new(1600, 128))      // Flattened features β†’ 128
    .add(ReLU::new())
    .add(Dense::new(128, 10))        // 10 classes
    .add(Softmax::new());

// Process 28Γ—28 image
let image = Tensor::random(vec![1, 1, 28, 28]);
let predictions = cnn.forward(image);

Data Loading & Training

use mini_tensorflow::{Dataset, DataLoader, SGD, Optimizer};

// Load data from CSV
let dataset = Dataset::from_csv(
    "data.csv",
    vec![0, 1, 2, 3],  // feature columns
    4,                 // target column
    true               // has header
)?;

// Create data loader with batching
let mut loader = DataLoader::new(dataset, 32)
    .with_shuffle(true);

// Training loop
let mut optimizer = SGD::new(0.01);

for (batch_data, batch_labels) in loader.iter() {
    // Forward pass
    let predictions = model.forward(batch_data[0].clone());

    // Compute loss (simplified)
    let loss = compute_loss(&predictions, &batch_labels[0]);

    // Update parameters (with real gradients in production)
    let mut params = model.parameters_mut();
    optimizer.step(&mut params, &gradients);
}

Model Persistence

use mini_tensorflow::Saveable;

// Save model
model.save("trained_model.json")?;   // Human readable
model.save("trained_model.bin")?;    // Compact binary

// Load model
let mut new_model = Sequential::new()
    .add(Dense::new(784, 256))
    .add(ReLU::new());

new_model.load("trained_model.json")?;

πŸ“š API Documentation

Core Components

Tensor

impl Tensor {
    // Construction
    pub fn new(data: Vec<f64>, shape: Vec<usize>) -> Self;
    pub fn zeros(shape: Vec<usize>) -> Self;
    pub fn ones(shape: Vec<usize>) -> Self;
    pub fn random(shape: Vec<usize>) -> Self;

    // Operations
    pub fn add(&self, other: &Tensor) -> Tensor;
    pub fn mul(&self, other: &Tensor) -> Tensor;
    pub fn matmul(&self, other: &Tensor) -> Tensor;
    pub fn relu(&self) -> Tensor;
    pub fn sigmoid(&self) -> Tensor;
    pub fn softmax(&self) -> Tensor;

    // Shape manipulation
    pub fn reshape(&self, new_shape: Vec<usize>) -> Self;
    pub fn transpose(&self) -> Self;

    // Utilities
    pub fn sum(&self) -> f64;
    pub fn mean(&self) -> f64;
}

Sequential Model

impl Sequential {
    pub fn new() -> Self;
    pub fn add<L: Layer + 'static>(self, layer: L) -> Self;
    pub fn forward(&self, input: Tensor) -> Tensor;
    pub fn parameters(&self) -> Vec<&Tensor>;
    pub fn summary(&self);
}

Layers

// Dense layer
Dense::new(input_size: usize, output_size: usize) -> Dense;

// Convolutional layer
Conv2D::new(in_channels: usize, out_channels: usize, kernel_size: usize) -> Conv2D;

// Pooling layer
MaxPool2D::new(kernel_size: usize) -> MaxPool2D;

// Activation layers
ReLU::new() -> ReLU;
Sigmoid::new() -> Sigmoid;
Softmax::new() -> Softmax;

Optimizers

// Stochastic Gradient Descent
SGD::new(learning_rate: f64) -> SGD;
SGD::with_momentum(learning_rate: f64, momentum: f64) -> SGD;

// Adam optimizer
Adam::new(learning_rate: f64) -> Adam;
Adam::with_params(lr: f64, beta1: f64, beta2: f64, epsilon: f64) -> Adam;

// Training step
optimizer.step(parameters: &mut [&mut Tensor], gradients: &[&Tensor]);

🎯 Examples

1. Basic Operations

cargo run --example basic_operations

Demonstrates tensor creation, arithmetic, and shape manipulation.

2. Sequential Neural Network

cargo run --example sequential_model

Multi-layer perceptron with dense layers and activations.

3. Convolutional Neural Network

cargo run --example cnn_example

Image classification with Conv2D, pooling, and dense layers.

4. Data Loading Pipeline

cargo run --example data_loading

CSV loading, synthetic datasets, batching, and normalization.

5. Performance Benchmarks

cargo run --example simd_benchmark

SIMD vs regular operations, parallel computing benchmarks.

6. Model Serialization

cargo run --example model_serialization

Saving and loading trained models in JSON and binary formats.

⚑ Performance

Benchmarks (on typical hardware)

Operation               Regular    SIMD      Parallel   Speedup
──────────────────────────────────────────────────────────────
Vector Addition (1M)    18ms      12ms      8ms        2.2x
Matrix Multiply (500Β²)   5.0s      N/A       1.1s       4.5x
Element-wise Multiply    20ms      15ms      8ms        2.5x
ReLU Activation         15ms      9ms       N/A        1.7x
Memory Bandwidth        1.6GB/s   5.7GB/s   N/A        3.6x

Optimization Features

  • SIMD Vectorization: 4-element f64 operations using AVX2
  • Parallel Computing: Multi-threaded operations via Rayon
  • Memory Efficiency: Zero-copy operations where possible
  • Cache Optimization: Contiguous memory layout

Platform Support

  • x86_64: Full SIMD optimization enabled
  • ARM64: Parallel operations (SIMD fallback)
  • Other: Regular operations with parallel support

πŸ”§ Advanced Features

Custom Layer Implementation

use mini_tensorflow::{Tensor, Layer};

#[derive(Debug, Clone)]
struct BatchNorm {
    gamma: Tensor,
    beta: Tensor,
    running_mean: Tensor,
    running_var: Tensor,
    epsilon: f64,
}

impl Layer for BatchNorm {
    fn forward(&self, input: &Tensor) -> Tensor {
        // Implement batch normalization
        // normalized = (input - mean) / sqrt(var + epsilon)
        // output = gamma * normalized + beta
        todo!()
    }

    // Implement other required methods...
}

Performance Optimization Tips

use mini_tensorflow::{SIMDOps, ParallelOps};

// Use SIMD for element-wise operations on large tensors
let result = tensor_a.simd_add(&tensor_b);

// Use parallel operations for matrix multiplication
let matmul = matrix_a.parallel_matmul(&matrix_b);

// Prefer in-place operations when possible
tensor.data.iter_mut().for_each(|x| *x = x.max(0.0)); // In-place ReLU

πŸ—οΈ Architecture Decisions

Memory Management

  • Ownership Model: Rust's ownership system prevents data races
  • RAII: Automatic resource cleanup via Drop trait
  • Zero-Copy: References and borrowing minimize allocations
  • Contiguous Layout: Vec for cache-friendly access patterns

Numerical Stability

  • f64 Precision: Double precision throughout for accuracy
  • Overflow Protection: Checked operations in debug builds
  • Numerical Algorithms: Stable implementations of softmax, etc.

Extensibility

  • Trait System: Layer trait enables custom implementations
  • Generic Design: Template-like functionality without runtime cost
  • Module System: Clean separation of concerns

πŸ§ͺ Testing

# Run all tests
cargo test

# Run with full output
cargo test -- --nocapture

# Run specific test
cargo test tensor_operations

# Benchmark tests
cargo bench

🀝 Contributing

Development Setup

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make changes and add tests
  4. Run tests: cargo test
  5. Submit a pull request

Code Style

  • Follow Rust standard formatting: cargo fmt
  • Run clippy lints: cargo clippy
  • Add documentation for public APIs
  • Include examples in documentation

Areas for Contribution

  • Gradient Computation: Implement full automatic differentiation
  • More Optimizers: RMSprop, AdaGrad, etc.
  • Advanced Layers: LSTM, Transformer, BatchNorm
  • GPU Support: CUDA or OpenCL backends
  • Model Formats: ONNX import/export
  • Distributed Training: Multi-node support

πŸ› Known Limitations

  • Gradients: Currently simplified backward pass implementation
  • GPU: CPU-only, no GPU acceleration yet
  • Dynamic Shapes: Limited dynamic graph support
  • Memory: Large models may hit memory constraints
  • Precision: Single precision (f32) not yet supported

πŸ“„ License

MIT License - see LICENSE file for details.

β˜• Support & Community

If you find Ignitia helpful, consider supporting the project:

Buy Me A Coffee

πŸ™ Acknowledgments

  • PyTorch: API design inspiration
  • Candle: Rust ML framework reference
  • Rayon: Parallel computing library
  • Serde: Serialization framework
  • The Rust Community: Excellent tooling and libraries

πŸ“ž Support

  • Issues: GitHub Issues for bugs and feature requests
  • Discussions: GitHub Discussions for questions

Made with ❀️ and πŸ¦€ Rust ❀️ by Aarambh Dev Hub

Mini TensorFlow demonstrates that systems programming languages can be both safe AND fast for machine learning workloads.