/KAN

Kolmogorov-Arnold Networks simple implementation using pytorch

Primary LanguageJupyter NotebookMIT LicenseMIT

KAN (Kolmogorov-Arnold Networks) Implementation

This repository contains an implementation of Kolmogorov-Arnold Networks (KAN) for digit classification using the MNIST dataset. The implementation includes the network architecture, data loading, training, and evaluation processes.

Overview

KAN is a type of neural network that utilizes B-spline bases for its layers. This repository provides a complete example of how to define, train, and evaluate a KAN model.

Project Structure

  • layer.py: Contains the implementation of the Layer class, which is a key component of the KAN network.
  • kan.py: Defines the KAN class, which builds the Kolmogorov-Arnold Network using multiple Layer instances.
  • train.py: Script to train the KAN model using MNIST data.
  • eval.py: Script to evaluate the trained KAN model on the validation dataset.

Dependencies

  • PyTorch
  • torchvision
  • numpy
  • tqdm

Install the dependencies using:

pip install torch torchvision numpy tqdm

Implementation Details

Layer Class

The Layer class is a fundamental part of the KAN network. It implements a neural network layer using B-spline bases with the following features:

  • Input and Output Features: Configurable number of input and output features.
  • Grid Size and Spline Order: Parameters for B-spline basis functions.
  • Scaling Factors: For noise, base weights, and spline weights.
  • Activation Function: Default is SiLU.
  • Functions: Includes methods for parameter resetting, spline weight scaling, and forward pass operations.

KAN Class

The KAN class defines the Kolmogorov-Arnold Network with the following features:

  • Network Architecture: Includes hidden layers with sizes [784, 64, 10].
  • Training and Validation: Uses MNIST dataset for training and validation.
  • Optimizer: AdamW with a learning rate of 1e-3 and weight decay of 1e-4.
  • Learning Rate Scheduler: ExponentialLR with a decay factor of 0.8.
  • Loss Function: CrossEntropyLoss for classification.

Data Loading and Training

Data Loading

  • Data Preparation: Data is transformed and normalized using transforms.Compose. Normalization scales pixel values to the range [-1, 1], which is suitable for training neural networks.
  • Training and Validation Datasets: MNIST datasets for training (train=True) and validation (train=False) are downloaded and loaded.
  • Data Loaders: DataLoader objects are used to load data in batches of 64 for training and validation.

Training Process

  • Model Training Mode: The model is set to training mode (train()).
  • Training Loop: Training data is loaded from trainloader and the following operations are performed:
    • Image Preprocessing: Images are converted to 784-dimensional vectors and transferred to the computation device.
    • Gradient Zeroing: Gradients of the optimizer are zeroed.
    • Prediction and Error Calculation: Model output is computed, and error is calculated using the loss function.
    • Gradient Calculation and Update: Gradients are calculated and model weights are updated.
    • Accuracy Calculation: Model accuracy is computed based on outputs and true labels.
    • Progress Display: Training progress is displayed using tqdm.

Evaluation Process

  • Model Evaluation Mode: The model is set to evaluation mode (eval()).
  • Evaluation Loop: Validation data is loaded from valloader and the following operations are performed:
    • Image Preprocessing: Images are converted to 784-dimensional vectors and transferred to the computation device.
    • Prediction and Error Calculation: Model output is computed, and error is calculated using the loss function.
    • Accuracy Calculation: Model accuracy is computed based on outputs and true labels.
    • Result Collection: Error and accuracy results are collected and averaged.

Training Script

Run the training script with:

python train.py

Evaluation Script

Run the evaluation script with:

python eval.py

Learning Rate Update and Results Display

  • Learning Rate Update: The learning rate is updated using the scheduler.
  • Results Display: Model error and accuracy on the validation dataset are printed for each epoch.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

  • The MNIST dataset is provided by Yann LeCun and colleagues.
  • PyTorch and torchvision libraries are used for deep learning and data handling.