🦀 Mini TensorFlow - Deep Learning Library in Rust

A comprehensive, high-performance deep learning library implemented in Rust, inspired by TensorFlow and PyTorch. Built with safety, speed, and ergonomics in mind.

🚀 Features

🔢 Tensor Operations: Multi-dimensional arrays with broadcasting support
🧠 Neural Networks: Dense, Convolutional, and Activation layers
📊 Computation Graph: Dynamic graph execution with forward pass
⚡ SIMD & Parallel: Vectorized operations and multi-core processing
💾 Model Serialization: Save/load models in JSON and binary formats
📈 Data Loading: CSV support, synthetic datasets, and batch processing
🔧 Optimizers: SGD with momentum and Adam optimizer
🛡️ Memory Safety: Zero-cost abstractions with Rust's ownership system

🏗️ Architecture

System Overview

┌─────────────────────────────────────────────────────────────────┐
│                     Mini TensorFlow Library                     │
├─────────────────────────────────────────────────────────────────┤
│  Examples Layer                                                 │
│  ┌─────────────┬─────────────┬─────────────┬─────────────────┐  │
│  │ Sequential  │ CNN Example │ Data Loading│ SIMD Benchmark  │  │
│  │ Models      │ & Training  │ & Batching  │ & Performance   │  │
│  └─────────────┴─────────────┴─────────────┴─────────────────┘  │
├─────────────────────────────────────────────────────────────────┤
│  High-Level API                                                 │
│  ┌─────────────┬─────────────┬─────────────┬─────────────────┐  │
│  │ Sequential  │ Layer       │ DataLoader  │ Model           │  │
│  │ Container   │ Abstraction │ & Dataset   │ Serialization   │  │
│  └─────────────┴─────────────┴─────────────┴─────────────────┘  │
├─────────────────────────────────────────────────────────────────┤
│  Core Components                                                │
│  ┌─────────────┬─────────────┬─────────────┬─────────────────┐  │
│  │ Layers      │ Convolution │ Optimizers  │ Autograd        │  │
│  │ (Dense,     │ (Conv2D,    │ (SGD,       │ (Variables,     │  │
│  │ Activations)│ MaxPool2D)  │ Adam)       │ Gradients)      │  │
│  └─────────────┴─────────────┴─────────────┴─────────────────┘  │
├─────────────────────────────────────────────────────────────────┤
│  Computation Engine                                             │
│  ┌─────────────┬─────────────┬─────────────┬─────────────────┐  │
│  │ Tensor      │ Graph       │ SIMD        │ Parallel        │  │
│  │ Operations  │ Execution   │ Operations  │ Computing       │  │
│  └─────────────┴─────────────┴─────────────┴─────────────────┘  │
├─────────────────────────────────────────────────────────────────┤
│  Foundation                                                     │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │            Rust Memory Safety & Performance               │  │
│  │     (Zero-cost abstractions, RAII, Ownership model)      │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Data Flow Architecture

Input Data → Tensor → Layer Chain → Output → Loss → Optimizer → Updated Parameters
     │         │         │           │        │         │              │
     │         │         │           │        │         │              │
     ▼         ▼         ▼           ▼        ▼         ▼              ▼
  ┌─────┐ ┌─────────┐ ┌─────────┐ ┌─────┐ ┌──────┐ ┌────────┐ ┌─────────────┐
  │ CSV │ │ Tensor  │ │ Conv2D  │ │ Loss│ │ SGD/ │ │ Params │ │ Serialized  │
  │Files│ │ Ops     │ │ Dense   │ │ Calc│ │ Adam │ │ Update │ │ Model       │
  │ ... │ │ SIMD    │ │ ReLU    │ │ ... │ │ ... │ │ ...    │ │ JSON/Binary │
  └─────┘ └─────────┘ └─────────┘ └─────┘ └──────┘ └────────┘ └─────────────┘

Tensor Computation Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                           Tensor Operations                                 │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                          Shape Validation                                  │
│                     (Broadcasting, Dimension checks)                       │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────┬─────────────────┬─────────────────┬─────────────────────┐
│ Regular Ops     │ SIMD Optimized  │ Parallel Ops    │ Specialized Ops     │
│ - Element-wise  │ - f64x4 vectors │ - Rayon threads │ - Matrix multiply   │
│ - Single thread │ - AVX/SSE       │ - Multi-core    │ - Convolution       │
│ - Standard loop │ - 4x throughput │ - Work stealing │ - Activation funcs  │
└─────────────────┴─────────────────┴─────────────────┴─────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                            Result Tensor                                   │
│                         (New shape, data)                                  │
└─────────────────────────────────────────────────────────────────────────────┘

Neural Network Layer Architecture

Sequential Model Container
│
├── Layer 1: Input Processing
│   ├── Conv2D(in_channels=1, out_channels=32, kernel=3x3)
│   ├── ReLU activation
│   └── MaxPool2D(kernel=2x2, stride=2)
│
├── Layer 2: Feature Extraction
│   ├── Conv2D(in_channels=32, out_channels=64, kernel=3x3)
│   ├── ReLU activation
│   └── MaxPool2D(kernel=2x2, stride=2)
│
├── Layer 3: Flattening
│   └── Flatten(4D → 2D conversion)
│
├── Layer 4: Classification Head
│   ├── Dense(features_in=1600, features_out=128)
│   ├── ReLU activation
│   ├── Dense(features_in=128, features_out=10)
│   └── Softmax activation
│
└── Output: Probability Distribution [batch_size, num_classes]

Memory Management Flow

Stack Memory                  Heap Memory                   GPU/SIMD
┌─────────────┐              ┌─────────────┐              ┌─────────────┐
│ References  │              │ Tensor Data │              │ Vectorized  │
│ &Tensor     │─────────────▶│ Vec<f64>    │─────────────▶│ Operations  │
│ &mut Tensor │              │ Shape info  │              │ f64x4 SIMD  │
│ Temporaries │              │ Gradients   │              │ Parallel    │
└─────────────┘              └─────────────┘              └─────────────┘
      │                             │                             │
      ▼                             ▼                             ▼
  Zero-copy                    RAII cleanup               Hardware acceleration
  borrowing                   Automatic drop              AVX2/FMA instructions

📦 Installation

Prerequisites

Rust 1.70+ (2021 edition)
Cargo package manager

Quick Setup

# Clone the repository
git clone https://github.com/AarambhDevHub/mini-tensorflow.git
cd mini-tensorflow

# Build the project
cargo build --release

# Run tests
cargo test

# Run examples
cargo run --example sequential_model
cargo run --example cnn_example
cargo run --example data_loading
cargo run --example simd_benchmark

Dependencies

[dependencies]
rand = "0.8"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
bincode = "1.3"
csv = "1.2"
rayon = "1.7"
num-traits = "0.2"

[target.'cfg(target_arch = "x86_64")'.dependencies]
wide = "0.7"  # SIMD operations

🚀 Quick Start

Basic Tensor Operations

use mini_tensorflow::Tensor;

// Create tensors
let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0], vec![2, 2]);
let b = Tensor::new(vec![5.0, 6.0, 7.0, 8.0], vec![2, 2]);

// Basic operations
let sum = a.add(&b);                    // Element-wise addition
let product = a.matmul(&b);             // Matrix multiplication
let activated = a.relu();               // ReLU activation

println!("Sum: {}", sum);
println!("Matrix product: {}", product);
println!("ReLU activated: {}", activated);

Building Neural Networks

use mini_tensorflow::{Sequential, Dense, ReLU, Softmax};

// Create a multi-layer perceptron
let model = Sequential::new()
    .add(Dense::new(784, 256))    // Input layer
    .add(ReLU::new())
    .add(Dense::new(256, 128))    // Hidden layer
    .add(ReLU::new())
    .add(Dense::new(128, 10))     // Output layer
    .add(Softmax::new());

// Display model architecture
model.summary();

// Forward pass
let input = Tensor::random(vec![1, 784]);
let output = model.forward(input);
println!("Predictions: {}", output);

Convolutional Neural Networks

use mini_tensorflow::{Sequential, Conv2D, MaxPool2D, Flatten, Dense, ReLU, Softmax};

// Create CNN for image classification
let cnn = Sequential::new()
    .add(Conv2D::new(1, 32, 3))      // 1→32 channels, 3×3 kernel
    .add(ReLU::new())
    .add(MaxPool2D::new(2))          // 2×2 pooling
    .add(Conv2D::new(32, 64, 3))     // 32→64 channels
    .add(ReLU::new())
    .add(MaxPool2D::new(2))
    .add(Flatten::new())
    .add(Dense::new(1600, 128))      // Flattened features → 128
    .add(ReLU::new())
    .add(Dense::new(128, 10))        // 10 classes
    .add(Softmax::new());

// Process 28×28 image
let image = Tensor::random(vec![1, 1, 28, 28]);
let predictions = cnn.forward(image);

Data Loading & Training

use mini_tensorflow::{Dataset, DataLoader, SGD, Optimizer};

// Load data from CSV
let dataset = Dataset::from_csv(
    "data.csv",
    vec![0, 1, 2, 3],  // feature columns
    4,                 // target column
    true               // has header
)?;

// Create data loader with batching
let mut loader = DataLoader::new(dataset, 32)
    .with_shuffle(true);

// Training loop
let mut optimizer = SGD::new(0.01);

for (batch_data, batch_labels) in loader.iter() {
    // Forward pass
    let predictions = model.forward(batch_data[0].clone());

    // Compute loss (simplified)
    let loss = compute_loss(&predictions, &batch_labels[0]);

    // Update parameters (with real gradients in production)
    let mut params = model.parameters_mut();
    optimizer.step(&mut params, &gradients);
}

Model Persistence

use mini_tensorflow::Saveable;

// Save model
model.save("trained_model.json")?;   // Human readable
model.save("trained_model.bin")?;    // Compact binary

// Load model
let mut new_model = Sequential::new()
    .add(Dense::new(784, 256))
    .add(ReLU::new());

new_model.load("trained_model.json")?;

📚 API Documentation

Core Components

Tensor

impl Tensor {
    // Construction
    pub fn new(data: Vec<f64>, shape: Vec<usize>) -> Self;
    pub fn zeros(shape: Vec<usize>) -> Self;
    pub fn ones(shape: Vec<usize>) -> Self;
    pub fn random(shape: Vec<usize>) -> Self;

    // Operations
    pub fn add(&self, other: &Tensor) -> Tensor;
    pub fn mul(&self, other: &Tensor) -> Tensor;
    pub fn matmul(&self, other: &Tensor) -> Tensor;
    pub fn relu(&self) -> Tensor;
    pub fn sigmoid(&self) -> Tensor;
    pub fn softmax(&self) -> Tensor;

    // Shape manipulation
    pub fn reshape(&self, new_shape: Vec<usize>) -> Self;
    pub fn transpose(&self) -> Self;

    // Utilities
    pub fn sum(&self) -> f64;
    pub fn mean(&self) -> f64;
}

Sequential Model

impl Sequential {
    pub fn new() -> Self;
    pub fn add<L: Layer + 'static>(self, layer: L) -> Self;
    pub fn forward(&self, input: Tensor) -> Tensor;
    pub fn parameters(&self) -> Vec<&Tensor>;
    pub fn summary(&self);
}

Layers

// Dense layer
Dense::new(input_size: usize, output_size: usize) -> Dense;

// Convolutional layer
Conv2D::new(in_channels: usize, out_channels: usize, kernel_size: usize) -> Conv2D;

// Pooling layer
MaxPool2D::new(kernel_size: usize) -> MaxPool2D;

// Activation layers
ReLU::new() -> ReLU;
Sigmoid::new() -> Sigmoid;
Softmax::new() -> Softmax;

Optimizers

// Stochastic Gradient Descent
SGD::new(learning_rate: f64) -> SGD;
SGD::with_momentum(learning_rate: f64, momentum: f64) -> SGD;

// Adam optimizer
Adam::new(learning_rate: f64) -> Adam;
Adam::with_params(lr: f64, beta1: f64, beta2: f64, epsilon: f64) -> Adam;

// Training step
optimizer.step(parameters: &mut [&mut Tensor], gradients: &[&Tensor]);

🎯 Examples

1. Basic Operations

cargo run --example basic_operations

Demonstrates tensor creation, arithmetic, and shape manipulation.

2. Sequential Neural Network

cargo run --example sequential_model

Multi-layer perceptron with dense layers and activations.

3. Convolutional Neural Network

cargo run --example cnn_example

Image classification with Conv2D, pooling, and dense layers.

4. Data Loading Pipeline

cargo run --example data_loading

CSV loading, synthetic datasets, batching, and normalization.

5. Performance Benchmarks

cargo run --example simd_benchmark

SIMD vs regular operations, parallel computing benchmarks.

6. Model Serialization

cargo run --example model_serialization

Saving and loading trained models in JSON and binary formats.

⚡ Performance

Benchmarks (on typical hardware)

Operation               Regular    SIMD      Parallel   Speedup
──────────────────────────────────────────────────────────────
Vector Addition (1M)    18ms      12ms      8ms        2.2x
Matrix Multiply (500²)   5.0s      N/A       1.1s       4.5x
Element-wise Multiply    20ms      15ms      8ms        2.5x
ReLU Activation         15ms      9ms       N/A        1.7x
Memory Bandwidth        1.6GB/s   5.7GB/s   N/A        3.6x

Optimization Features

SIMD Vectorization: 4-element f64 operations using AVX2
Parallel Computing: Multi-threaded operations via Rayon
Memory Efficiency: Zero-copy operations where possible
Cache Optimization: Contiguous memory layout

Platform Support

x86_64: Full SIMD optimization enabled
ARM64: Parallel operations (SIMD fallback)
Other: Regular operations with parallel support

🔧 Advanced Features

Custom Layer Implementation

use mini_tensorflow::{Tensor, Layer};

#[derive(Debug, Clone)]
struct BatchNorm {
    gamma: Tensor,
    beta: Tensor,
    running_mean: Tensor,
    running_var: Tensor,
    epsilon: f64,
}

impl Layer for BatchNorm {
    fn forward(&self, input: &Tensor) -> Tensor {
        // Implement batch normalization
        // normalized = (input - mean) / sqrt(var + epsilon)
        // output = gamma * normalized + beta
        todo!()
    }

    // Implement other required methods...
}

Performance Optimization Tips

use mini_tensorflow::{SIMDOps, ParallelOps};

// Use SIMD for element-wise operations on large tensors
let result = tensor_a.simd_add(&tensor_b);

// Use parallel operations for matrix multiplication
let matmul = matrix_a.parallel_matmul(&matrix_b);

// Prefer in-place operations when possible
tensor.data.iter_mut().for_each(|x| *x = x.max(0.0)); // In-place ReLU

🏗️ Architecture Decisions

Memory Management

Ownership Model: Rust's ownership system prevents data races
RAII: Automatic resource cleanup via Drop trait
Zero-Copy: References and borrowing minimize allocations
Contiguous Layout: Vec for cache-friendly access patterns

Numerical Stability

f64 Precision: Double precision throughout for accuracy
Overflow Protection: Checked operations in debug builds
Numerical Algorithms: Stable implementations of softmax, etc.

Extensibility

Trait System: Layer trait enables custom implementations
Generic Design: Template-like functionality without runtime cost
Module System: Clean separation of concerns

🧪 Testing

# Run all tests
cargo test

# Run with full output
cargo test -- --nocapture

# Run specific test
cargo test tensor_operations

# Benchmark tests
cargo bench

🤝 Contributing

Development Setup

Fork the repository
Create a feature branch: git checkout -b feature-name
Make changes and add tests
Run tests: cargo test
Submit a pull request

Code Style

Follow Rust standard formatting: cargo fmt
Run clippy lints: cargo clippy
Add documentation for public APIs
Include examples in documentation

Areas for Contribution

Gradient Computation: Implement full automatic differentiation
More Optimizers: RMSprop, AdaGrad, etc.
Advanced Layers: LSTM, Transformer, BatchNorm
GPU Support: CUDA or OpenCL backends
Model Formats: ONNX import/export
Distributed Training: Multi-node support

🐛 Known Limitations

Gradients: Currently simplified backward pass implementation
GPU: CPU-only, no GPU acceleration yet
Dynamic Shapes: Limited dynamic graph support
Memory: Large models may hit memory constraints
Precision: Single precision (f32) not yet supported

📄 License

MIT License - see LICENSE file for details.

☕ Support & Community

If you find Ignitia helpful, consider supporting the project:

🙏 Acknowledgments

PyTorch: API design inspiration
Candle: Rust ML framework reference
Rayon: Parallel computing library
Serde: Serialization framework
The Rust Community: Excellent tooling and libraries

📞 Support

Issues: GitHub Issues for bugs and feature requests
Discussions: GitHub Discussions for questions

Made with ❤️ and 🦀 Rust ❤️ by Aarambh Dev Hub

Mini TensorFlow demonstrates that systems programming languages can be both safe AND fast for machine learning workloads.