A comprehensive, high-performance deep learning library implemented in Rust, inspired by TensorFlow and PyTorch. Built with safety, speed, and ergonomics in mind.
- π’ Tensor Operations: Multi-dimensional arrays with broadcasting support
- π§ Neural Networks: Dense, Convolutional, and Activation layers
- π Computation Graph: Dynamic graph execution with forward pass
- β‘ SIMD & Parallel: Vectorized operations and multi-core processing
- πΎ Model Serialization: Save/load models in JSON and binary formats
- π Data Loading: CSV support, synthetic datasets, and batch processing
- π§ Optimizers: SGD with momentum and Adam optimizer
- π‘οΈ Memory Safety: Zero-cost abstractions with Rust's ownership system
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Mini TensorFlow Library β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Examples Layer β
β βββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ¬ββββββββββββββββββ β
β β Sequential β CNN Example β Data Loadingβ SIMD Benchmark β β
β β Models β & Training β & Batching β & Performance β β
β βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ΄ββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β High-Level API β
β βββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ¬ββββββββββββββββββ β
β β Sequential β Layer β DataLoader β Model β β
β β Container β Abstraction β & Dataset β Serialization β β
β βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ΄ββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Core Components β
β βββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ¬ββββββββββββββββββ β
β β Layers β Convolution β Optimizers β Autograd β β
β β (Dense, β (Conv2D, β (SGD, β (Variables, β β
β β Activations)β MaxPool2D) β Adam) β Gradients) β β
β βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ΄ββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Computation Engine β
β βββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ¬ββββββββββββββββββ β
β β Tensor β Graph β SIMD β Parallel β β
β β Operations β Execution β Operations β Computing β β
β βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ΄ββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Foundation β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Rust Memory Safety & Performance β β
β β (Zero-cost abstractions, RAII, Ownership model) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Input Data β Tensor β Layer Chain β Output β Loss β Optimizer β Updated Parameters
β β β β β β β
β β β β β β β
βΌ βΌ βΌ βΌ βΌ βΌ βΌ
βββββββ βββββββββββ βββββββββββ βββββββ ββββββββ ββββββββββ βββββββββββββββ
β CSV β β Tensor β β Conv2D β β Lossβ β SGD/ β β Params β β Serialized β
βFilesβ β Ops β β Dense β β Calcβ β Adam β β Update β β Model β
β ... β β SIMD β β ReLU β β ... β β ... β β ... β β JSON/Binary β
βββββββ βββββββββββ βββββββββββ βββββββ ββββββββ ββββββββββ βββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Tensor Operations β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Shape Validation β
β (Broadcasting, Dimension checks) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββββββ
β Regular Ops β SIMD Optimized β Parallel Ops β Specialized Ops β
β - Element-wise β - f64x4 vectors β - Rayon threads β - Matrix multiply β
β - Single thread β - AVX/SSE β - Multi-core β - Convolution β
β - Standard loop β - 4x throughput β - Work stealing β - Activation funcs β
βββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Result Tensor β
β (New shape, data) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Sequential Model Container
β
βββ Layer 1: Input Processing
β βββ Conv2D(in_channels=1, out_channels=32, kernel=3x3)
β βββ ReLU activation
β βββ MaxPool2D(kernel=2x2, stride=2)
β
βββ Layer 2: Feature Extraction
β βββ Conv2D(in_channels=32, out_channels=64, kernel=3x3)
β βββ ReLU activation
β βββ MaxPool2D(kernel=2x2, stride=2)
β
βββ Layer 3: Flattening
β βββ Flatten(4D β 2D conversion)
β
βββ Layer 4: Classification Head
β βββ Dense(features_in=1600, features_out=128)
β βββ ReLU activation
β βββ Dense(features_in=128, features_out=10)
β βββ Softmax activation
β
βββ Output: Probability Distribution [batch_size, num_classes]
Stack Memory Heap Memory GPU/SIMD
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β References β β Tensor Data β β Vectorized β
β &Tensor βββββββββββββββΆβ Vec<f64> βββββββββββββββΆβ Operations β
β &mut Tensor β β Shape info β β f64x4 SIMD β
β Temporaries β β Gradients β β Parallel β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β β β
βΌ βΌ βΌ
Zero-copy RAII cleanup Hardware acceleration
borrowing Automatic drop AVX2/FMA instructions
- Rust 1.70+ (2021 edition)
- Cargo package manager
# Clone the repository
git clone https://github.com/AarambhDevHub/mini-tensorflow.git
cd mini-tensorflow
# Build the project
cargo build --release
# Run tests
cargo test
# Run examples
cargo run --example sequential_model
cargo run --example cnn_example
cargo run --example data_loading
cargo run --example simd_benchmark[dependencies]
rand = "0.8"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
bincode = "1.3"
csv = "1.2"
rayon = "1.7"
num-traits = "0.2"
[target.'cfg(target_arch = "x86_64")'.dependencies]
wide = "0.7" # SIMD operationsuse mini_tensorflow::Tensor;
// Create tensors
let a = Tensor::new(vec![1.0, 2.0, 3.0, 4.0], vec![2, 2]);
let b = Tensor::new(vec![5.0, 6.0, 7.0, 8.0], vec![2, 2]);
// Basic operations
let sum = a.add(&b); // Element-wise addition
let product = a.matmul(&b); // Matrix multiplication
let activated = a.relu(); // ReLU activation
println!("Sum: {}", sum);
println!("Matrix product: {}", product);
println!("ReLU activated: {}", activated);use mini_tensorflow::{Sequential, Dense, ReLU, Softmax};
// Create a multi-layer perceptron
let model = Sequential::new()
.add(Dense::new(784, 256)) // Input layer
.add(ReLU::new())
.add(Dense::new(256, 128)) // Hidden layer
.add(ReLU::new())
.add(Dense::new(128, 10)) // Output layer
.add(Softmax::new());
// Display model architecture
model.summary();
// Forward pass
let input = Tensor::random(vec![1, 784]);
let output = model.forward(input);
println!("Predictions: {}", output);use mini_tensorflow::{Sequential, Conv2D, MaxPool2D, Flatten, Dense, ReLU, Softmax};
// Create CNN for image classification
let cnn = Sequential::new()
.add(Conv2D::new(1, 32, 3)) // 1β32 channels, 3Γ3 kernel
.add(ReLU::new())
.add(MaxPool2D::new(2)) // 2Γ2 pooling
.add(Conv2D::new(32, 64, 3)) // 32β64 channels
.add(ReLU::new())
.add(MaxPool2D::new(2))
.add(Flatten::new())
.add(Dense::new(1600, 128)) // Flattened features β 128
.add(ReLU::new())
.add(Dense::new(128, 10)) // 10 classes
.add(Softmax::new());
// Process 28Γ28 image
let image = Tensor::random(vec![1, 1, 28, 28]);
let predictions = cnn.forward(image);use mini_tensorflow::{Dataset, DataLoader, SGD, Optimizer};
// Load data from CSV
let dataset = Dataset::from_csv(
"data.csv",
vec![0, 1, 2, 3], // feature columns
4, // target column
true // has header
)?;
// Create data loader with batching
let mut loader = DataLoader::new(dataset, 32)
.with_shuffle(true);
// Training loop
let mut optimizer = SGD::new(0.01);
for (batch_data, batch_labels) in loader.iter() {
// Forward pass
let predictions = model.forward(batch_data[0].clone());
// Compute loss (simplified)
let loss = compute_loss(&predictions, &batch_labels[0]);
// Update parameters (with real gradients in production)
let mut params = model.parameters_mut();
optimizer.step(&mut params, &gradients);
}use mini_tensorflow::Saveable;
// Save model
model.save("trained_model.json")?; // Human readable
model.save("trained_model.bin")?; // Compact binary
// Load model
let mut new_model = Sequential::new()
.add(Dense::new(784, 256))
.add(ReLU::new());
new_model.load("trained_model.json")?;impl Tensor {
// Construction
pub fn new(data: Vec<f64>, shape: Vec<usize>) -> Self;
pub fn zeros(shape: Vec<usize>) -> Self;
pub fn ones(shape: Vec<usize>) -> Self;
pub fn random(shape: Vec<usize>) -> Self;
// Operations
pub fn add(&self, other: &Tensor) -> Tensor;
pub fn mul(&self, other: &Tensor) -> Tensor;
pub fn matmul(&self, other: &Tensor) -> Tensor;
pub fn relu(&self) -> Tensor;
pub fn sigmoid(&self) -> Tensor;
pub fn softmax(&self) -> Tensor;
// Shape manipulation
pub fn reshape(&self, new_shape: Vec<usize>) -> Self;
pub fn transpose(&self) -> Self;
// Utilities
pub fn sum(&self) -> f64;
pub fn mean(&self) -> f64;
}impl Sequential {
pub fn new() -> Self;
pub fn add<L: Layer + 'static>(self, layer: L) -> Self;
pub fn forward(&self, input: Tensor) -> Tensor;
pub fn parameters(&self) -> Vec<&Tensor>;
pub fn summary(&self);
}// Dense layer
Dense::new(input_size: usize, output_size: usize) -> Dense;
// Convolutional layer
Conv2D::new(in_channels: usize, out_channels: usize, kernel_size: usize) -> Conv2D;
// Pooling layer
MaxPool2D::new(kernel_size: usize) -> MaxPool2D;
// Activation layers
ReLU::new() -> ReLU;
Sigmoid::new() -> Sigmoid;
Softmax::new() -> Softmax;// Stochastic Gradient Descent
SGD::new(learning_rate: f64) -> SGD;
SGD::with_momentum(learning_rate: f64, momentum: f64) -> SGD;
// Adam optimizer
Adam::new(learning_rate: f64) -> Adam;
Adam::with_params(lr: f64, beta1: f64, beta2: f64, epsilon: f64) -> Adam;
// Training step
optimizer.step(parameters: &mut [&mut Tensor], gradients: &[&Tensor]);cargo run --example basic_operationsDemonstrates tensor creation, arithmetic, and shape manipulation.
cargo run --example sequential_modelMulti-layer perceptron with dense layers and activations.
cargo run --example cnn_exampleImage classification with Conv2D, pooling, and dense layers.
cargo run --example data_loadingCSV loading, synthetic datasets, batching, and normalization.
cargo run --example simd_benchmarkSIMD vs regular operations, parallel computing benchmarks.
cargo run --example model_serializationSaving and loading trained models in JSON and binary formats.
Operation Regular SIMD Parallel Speedup
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Vector Addition (1M) 18ms 12ms 8ms 2.2x
Matrix Multiply (500Β²) 5.0s N/A 1.1s 4.5x
Element-wise Multiply 20ms 15ms 8ms 2.5x
ReLU Activation 15ms 9ms N/A 1.7x
Memory Bandwidth 1.6GB/s 5.7GB/s N/A 3.6x
- SIMD Vectorization: 4-element f64 operations using AVX2
- Parallel Computing: Multi-threaded operations via Rayon
- Memory Efficiency: Zero-copy operations where possible
- Cache Optimization: Contiguous memory layout
- x86_64: Full SIMD optimization enabled
- ARM64: Parallel operations (SIMD fallback)
- Other: Regular operations with parallel support
use mini_tensorflow::{Tensor, Layer};
#[derive(Debug, Clone)]
struct BatchNorm {
gamma: Tensor,
beta: Tensor,
running_mean: Tensor,
running_var: Tensor,
epsilon: f64,
}
impl Layer for BatchNorm {
fn forward(&self, input: &Tensor) -> Tensor {
// Implement batch normalization
// normalized = (input - mean) / sqrt(var + epsilon)
// output = gamma * normalized + beta
todo!()
}
// Implement other required methods...
}use mini_tensorflow::{SIMDOps, ParallelOps};
// Use SIMD for element-wise operations on large tensors
let result = tensor_a.simd_add(&tensor_b);
// Use parallel operations for matrix multiplication
let matmul = matrix_a.parallel_matmul(&matrix_b);
// Prefer in-place operations when possible
tensor.data.iter_mut().for_each(|x| *x = x.max(0.0)); // In-place ReLU- Ownership Model: Rust's ownership system prevents data races
- RAII: Automatic resource cleanup via Drop trait
- Zero-Copy: References and borrowing minimize allocations
- Contiguous Layout: Vec for cache-friendly access patterns
- f64 Precision: Double precision throughout for accuracy
- Overflow Protection: Checked operations in debug builds
- Numerical Algorithms: Stable implementations of softmax, etc.
- Trait System: Layer trait enables custom implementations
- Generic Design: Template-like functionality without runtime cost
- Module System: Clean separation of concerns
# Run all tests
cargo test
# Run with full output
cargo test -- --nocapture
# Run specific test
cargo test tensor_operations
# Benchmark tests
cargo bench- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make changes and add tests
- Run tests:
cargo test - Submit a pull request
- Follow Rust standard formatting:
cargo fmt - Run clippy lints:
cargo clippy - Add documentation for public APIs
- Include examples in documentation
- Gradient Computation: Implement full automatic differentiation
- More Optimizers: RMSprop, AdaGrad, etc.
- Advanced Layers: LSTM, Transformer, BatchNorm
- GPU Support: CUDA or OpenCL backends
- Model Formats: ONNX import/export
- Distributed Training: Multi-node support
- Gradients: Currently simplified backward pass implementation
- GPU: CPU-only, no GPU acceleration yet
- Dynamic Shapes: Limited dynamic graph support
- Memory: Large models may hit memory constraints
- Precision: Single precision (f32) not yet supported
MIT License - see LICENSE file for details.
If you find Ignitia helpful, consider supporting the project:
- PyTorch: API design inspiration
- Candle: Rust ML framework reference
- Rayon: Parallel computing library
- Serde: Serialization framework
- The Rust Community: Excellent tooling and libraries
- Issues: GitHub Issues for bugs and feature requests
- Discussions: GitHub Discussions for questions
Made with β€οΈ and π¦ Rust β€οΈ by Aarambh Dev Hub
Mini TensorFlow demonstrates that systems programming languages can be both safe AND fast for machine learning workloads.