In the rapidly evolving landscape of AI research, groundbreaking techniques are often scattered across countless repositories, disparate implementations, and dense academic papers. dl_techniques emerges as a unified, curated, and production-ready arsenal for advanced deep learning. Pioneered and sponsored by Electi Consulting, this library is more than a collection of components—it is the definitive toolkit for researchers and engineers to design, train, and dissect state-of-the-art neural networks.
We bridge the chasm between theoretical innovation and practical application, providing faithful, efficient, and enterprise-validated components. From next-generation attention mechanisms and graph neural networks to information-theoretic loss functions and an unparalleled model analysis suite, dl_techniques is your strategic advantage for pushing the boundaries of deep learning.
- Key Features: A Glimpse into the Arsenal.
- Why
dl_techniques?: Your Unfair Advantage in AI R&D. - Installation: Get Up and Running in Minutes.
- Quick Start: See the Library in Action.
- In-Depth Documentation: From Theory to Tensors.
- Project Structure: A Tour of the Repository.
- Contributing: Join Our Research & Development Efforts.
- License: Understanding the GPL-3.0 License.
- Acknowledgments: Recognition and Support.
- Citations & References: The Research That Inspired This Library.
This library is a comprehensive suite of tools organized into five key pillars, developed through rigorous research and validated in real-world enterprise applications:
1. State-of-the-Art Architectures & Models (150+ Implementations)
- Next-Generation Language Models: Production-ready implementations of
Gemma 3,Qwen3(including MEGA & SOM variants),Mamba, and modernBERTvariants featuring Block-wise Latent Transformers (BLT) and Hierarchical Reasoning Models (HRM). - Vision & Multimodal Powerhouses: A comprehensive suite including
CLIP,FastVLM,NanoVLM(including a score-based World Model),DINOv1/v2/v3,SigLIP-ViT, object detection withDETRandYOLOv12, and theSegment Anything Model (SAM). - Advanced CNNs: A rich collection of advanced convolutional architectures such as
ConvNeXtV1/V2,MobileNetV1-V4, the recursively-definedFractalNet, the complex shearlet-basedCoShNet, and the ultra-efficientSqueezeNetfamily. - Time Series & Forecasting: State-of-the-art forecasting models including the probabilistic
TiRexwith quantile prediction, an enhanced implementation ofN-BEATS, and the novelxLSTM. - Generative & Specialized Models: Task-specific models like
DepthAnythingfor monocular depth estimation, a completeVariational Autoencoder (VAE)framework, fullCapsule Networks(CapsNet) with dynamic routing, and unsupervised aligners likeMini-Vec2Vec. - Experimental Frontiers: Explore novel concepts including a wide range of
Graph Neural Networks(GNNs) inRELGT, Kolmogorov-Arnold Networks (KAN), and bio-mimetic models likeMothNet.
2. A Modular Arsenal of Advanced Layers (290+ Components)
- Pioneering Attention Mechanisms: Go beyond standard attention with
DifferentialMultiHeadAttention, modernHopfieldAttention,GroupQueryAttention,CapsuleRoutingAttention, and efficient alternatives likeFNetFourierTransformandRingAttention. - Unified Factory Architecture: A consistent, powerful factory system for creating and validating over 15+ attention mechanisms, 15+ normalization variants (including
BandRMS,LogitNorm), and 10+ Feed-Forward Network (FFN) types (SwiGLU,GeGLU,OrthoGLU) with a single line of code. - Graph & Structural Primitives: Configurable GNN layers with multiple aggregation strategies (
GCN,GAT,GraphSAGE),Relational Graph Transformer(RELGT) blocks, andEntity-Graph Refinementfor learning hierarchical relationships. - Mixture of Experts (MoE) System: A complete MoE implementation with configurable FFN experts, multiple gating strategies (including SoftMoE and Cosine Gating), and integrated training utilities.
- Probabilistic & Statistical Layers: Build models that reason about uncertainty with
Mixture Density Networks(MDN),Normalizing Flows, and time series analysis layers for residual autocorrelation (ResidualACFLayer). - Advanced Embeddings: A full suite of positional and semantic embeddings including standard, sinusoidal,
Rotary Position Embedding (RoPE), and its modern variants likeDualRotaryPositionEmbedding.
3. A Unified Command Center for Model Analysis & Introspection
- Holistic Model Analysis: A powerful
ModelAnalyzerto benchmark models across six critical dimensions: training dynamics, weight health, prediction calibration, information flow, and advanced spectral analysis. Its modular design includes specialized analyzers likeCalibrationAnalyzer,WeightAnalyzer, andInformationFlowAnalyzer. - Publication-Ready Visualizations: Automatically generate insightful visualizations, interactive summary dashboards, and comparative analysis plots with integrated statistical significance testing.
- Predictive Generalization with Spectral Analysis: Integrate the power of WeightWatcher through our
SpectralAnalyzerto assess generalization potential by analyzing the spectral properties (eigenvalues) of weight matrices—often without needing test data. - Deep Diagnostic Toolkit: Move beyond accuracy to diagnose overconfidence (
CalibrationAnalyzer), information bottlenecks (InformationFlowAnalyzer), weight decay and similarity (WeightAnalyzer), and learning efficiency (TrainingDynamicsAnalyzer).
4. Next-Generation Loss Functions & Optimization (20+ Specialized Losses)
- Optimize What Matters with
AnyLoss: A groundbreaking framework that transforms any confusion-matrix-based metric (e.g., F1-score, Balanced Accuracy, Matthews Correlation Coefficient) into a differentiable loss function for direct optimization on imbalanced data. - Information-Theoretic & Robust Losses: Train more robust models with
GoodhartAwareLossto combat spurious correlations, calibration-focused losses likeBrierScoreLoss, the uncertainty-awareFocalUncertaintyLoss, andDINO's self-distillation loss. - Domain-Specific Loss Functions: Specialized losses for vision-language (
CLIPContrastiveLoss,SigLIPLoss), segmentation (Dice,Focal,Tversky), time series (MASELoss,SMAPELoss), and generative modeling (WassersteinLosswith gradient penalty). - Advanced Optimization Suite: Leverage smart learning rate schedulers like
WarmupSchedule, utilities forDeepSupervisionin multi-scale architectures, and a suite of advanced regularizers (SoftOrthogonal,SRIP).
5. Enterprise-Grade Training & Deployment Infrastructure
- Accelerated Development with Training Pipelines: Over 25+ ready-to-use training scripts (
src/train/) for all major architectures, establishing standardized and reproducible workflows for training, validation, and testing across domains like NLP, Vision, and Time Series. - Production-Ready Utilities: A suite of tools including advanced data loaders, augmentation pipelines, a structured visualization and logging manager (
VisualizationManager), and enhanced model serialization with custom object support. - Assured Reliability: An extensive suite of over 600+ unit and integration tests (
tests/) ensures the correctness and stability of every component. - Validated Performance: Rigorous benchmarks against reference implementations and established academic results to guarantee numerical accuracy and performance.
- From Theory to Tensors, Instantly: We consolidate the fragmented landscape of AI research. Instead of hunting down dozens of disparate GitHub repos, you get a single, cohesive framework with faithful implementations of cutting-edge research.
- Built for Battle, Validated in the Enterprise: This is not a toy library. Every component has been hardened and validated in demanding enterprise applications, ensuring robustness, efficiency, and production-readiness.
- Unprecedented Introspection: Move beyond accuracy scores. Our first-class analysis toolkit is designed to answer the why behind your model's behavior, providing deep insights that are critical for building trustworthy and reliable AI.
- Engineered for Experimentation: Our innovative factory patterns and modular design are built for rapid prototyping. Swap attention mechanisms, normalization layers, or even entire architectural blocks with a single line of code.
- Modern, Maintainable, and Future-Proof: Built from the ground up for Keras 3 and modern Python,
dl_techniquesadheres to the highest standards of software engineering, ensuring it's easy to use, extend, and maintain.
Note: This library requires Python 3.11+ and Keras 3.8.0 with the TensorFlow 2.18.0 backend.
-
Clone the repository:
git clone https://github.com/nikolasmarkou/dl_techniques.git cd dl_techniques -
Install dependencies: For standard usage, install the library and its core dependencies:
pip install . -
Editable Install (for developers): If you plan to contribute or modify the library's source code, install it in editable mode with development tools:
pip install -e ".[dev]"This installs additional tools such as
pytest,pylint, andblack. -
Verify Installation:
python -c "import dl_techniques; print('Installation successful!')"
Effortlessly construct and experiment with modern transformer components using our unified factory system.
import keras
from dl_techniques.layers.attention.factory import create_attention_layer
from dl_techniques.layers.norms.factory import create_normalization_layer
from dl_techniques.layers.ffn.factory import create_ffn_layer
inputs = keras.Input(shape=(1024, 512))
# Use factories for consistent, validated component creation
attention = create_attention_layer(
'differential_mha',
dim=512,
num_heads=8,
head_dim=64
)
norm = create_normalization_layer('rms_norm', epsilon=1e-6)
ffn = create_ffn_layer('swiglu_ffn', hidden_dim=2048)
# Build a modern transformer block
x = attention(inputs)
x = norm(x)
x = ffn(x)
model = keras.Model(inputs, x)
model.summary()Instantiate a state-of-the-art time series model capable of generating robust, uncertainty-aware forecasts.
from dl_techniques.models.tirex import create_tirex_model
# Create a TiRex model for multivariate probabilistic forecasting
model = create_tirex_model(
input_shape=(100, 10), # 100 timesteps, 10 features
forecast_horizon=24,
quantiles=[0.1, 0.5, 0.9], # Generate 80% prediction intervals
variant='base'
)
# Compile with a quantile loss to train for uncertainty estimation
model.compile(
optimizer='adamw',
loss='quantile_loss',
metrics=['mae', 'mse']
)Construct a powerful, efficient vision-language model for complex multimodal tasks in just a few lines.
from dl_techniques.models.fastvlm import FastVLM
# Create a FastVLM instance from a predefined, optimized variant
vlm = FastVLM.from_variant(
'base',
vocab_size=32000,
max_length=512,
image_size=224
)
# The model seamlessly handles both image and text inputs
image_input = keras.Input(shape=(224, 224, 3))
text_input = keras.Input(shape=(512,), dtype='int32')
outputs = vlm([image_input, text_input])
model = keras.Model([image_input, text_input], outputs)Go beyond surface-level metrics and gain deep, actionable insights into your models' performance and behavior.
from dl_techniques.analyzer import ModelAnalyzer, AnalysisConfig, DataInput
# Compare multiple models across a suite of deep diagnostics
models = {'TiRex_Model': tirex_model, 'Baseline_LSTM': lstm_model}
histories = {'TiRex_Model': tirex_history, 'Baseline_LSTM': lstm_history}
test_data = DataInput(x_test, y_test)
# Configure a comprehensive analysis run
config = AnalysisConfig(
analyze_training_dynamics=True,
analyze_calibration=True,
analyze_weight_health=True,
analyze_spectral=True, # Unleash spectral analysis for generalization insights
save_plots=True,
plot_style='publication'
)
analyzer = ModelAnalyzer(models, config=config, training_history=histories)
# Execute the complete analysis and generate a full suite of visualizations
results = analyzer.analyze(test_data)
# Access detailed, structured metrics programmatically for automated reporting
print(f"Best calibrated model (by ECE): {min(results.calibration_metrics.items(), key=lambda x: x[1]['ece'])}")
print(f"Training efficiency ranking (epochs to converge): {results.training_metrics.convergence_epochs}")Stop tuning class weights and start optimizing your target metric directly with the AnyLoss framework.
from dl_techniques.losses.any_loss import F1Loss, BalancedAccuracyLoss
# For imbalanced datasets, optimize F1-score directly
model.compile(
optimizer='adamw',
loss=F1Loss(from_logits=True),
metrics=['accuracy', 'precision', 'recall']
)
# Alternatively, optimize for balanced accuracy
model.compile(
optimizer='adamw',
loss=BalancedAccuracyLoss(from_logits=True),
metrics=['accuracy']
)Unlock insights from graph-structured data with our powerful and configurable GNN implementations.
from dl_techniques.layers.graphs import GraphNeuralNetworkLayer
# Create a Graph Attention Network (GAT) to process relational data
gnn = GraphNeuralNetworkLayer(
concept_dim=256,
num_layers=3,
message_passing='gat', # Use attention for message passing
aggregation='attention',
dropout_rate=0.1
)
# Apply the layer to graph-structured inputs
node_features = keras.Input(shape=(None, 256)) # Variable number of nodes
adjacency_matrix = keras.Input(shape=(None, None))
node_embeddings = gnn([node_features, adjacency_matrix])This library is engineered to be a living knowledge base, bridging the gap between academia and industry. Our documentation is split into two primary resources: in-depth theoretical guides and a comprehensive API reference.
Our research/ directory contains over 50 articles providing the theoretical foundations, implementation details, and best practices behind key components. Highlights include:
- Complete Transformer Guide (2025): A production-focused guide to implementing state-of-the-art Transformer architectures.
- Model Analyzer Guide: A comprehensive tutorial for the advanced model analysis toolkit.
- AnyLoss Framework: A deep dive into the theory and practice of direct metric optimization.
- Chronological Neural Architectures: An extensive chronological guide to influential architectures with implementation notes.
- Band-Constrained Normalization: A novel normalization technique that preserves magnitude information within bounded constraints.
- Mixture Density Networks: The theory and best practices for implementing probabilistic models.
For detailed, auto-generated documentation on every module, class, and function, please refer to the docs/ directory or the online documentation.
The repository is organized for clarity, maintainability, and ease of contribution.
dl_techniques/
├── src/dl_techniques/
│ ├── models/ # 150+ complete, ready-to-train model implementations
│ │ ├── qwen/ # Qwen3 family (Base, MEGA, SOM)
│ │ ├── dino/ # DINO v1, v2, v3
│ │ ├── fastvlm/ # Fast Vision-Language Models
│ │ └── ...
│ ├── layers/ # 290+ specialized layer implementations
│ │ ├── attention/ # 15+ modern attention mechanisms with factory
│ │ ├── norms/ # 15+ advanced normalization layers with factory
│ │ ├── ffn/ # 10+ feed-forward networks with factory
│ │ ├── graphs/ # Graph neural network components
│ │ ├── moe/ # Mixture of Experts (MoE) system
│ │ └── statistics/ # Statistical and probabilistic layers (MDN, Flows)
│ ├── losses/ # 20+ specialized loss functions (AnyLoss, Goodhart, etc.)
│ ├── analyzer/ # Comprehensive model analysis toolkit and visualizers
│ ├── regularizers/ # Advanced regularization techniques (SRIP, Orthogonal)
│ └── utils/ # Core utilities, loggers, and data handlers
├── research/ # 50+ in-depth articles and theoretical guides
├── src/train/ # 25+ ready-to-use training pipelines and scripts
├── tests/ # 600+ unit and integration tests ensuring component reliability
└── docs/ # Auto-generated API documentation
We welcome contributions from the research community. Whether you are implementing a new technique, improving documentation, or fixing bugs, your input is valuable.
- Fork & Clone the repository.
- Set up the development environment:
pip install -e ".[dev]". - Create a new branch for your feature:
git checkout -b feature/new-technique. - Adhere to our development standards: Use type hints, write comprehensive tests, and document your code thoroughly.
- Code Quality: Follow PEP 8 guidelines. Use
blackfor formatting andisortfor import sorting. - Testing: Write comprehensive tests using
pytest. Aim for coverage greater than 90%. - Documentation: Provide Sphinx-compliant docstrings and update relevant guides in the
research/directory. - Validation: Include benchmarks or comparisons against reference implementations where applicable.
- New Architectures: Implementations of recent, impactful research papers.
- Performance Improvements: Optimizations that maintain numerical accuracy.
- New Analysis Tools: Additional analyzers or visualizations for the
ModelAnalyzertoolkit. - Enhanced Documentation: Tutorials, guides, or improved API documentation.
This project is licensed under the GNU General Public License v3.0.
Important Considerations:
- Copyleft License: Any derivative works must also be licensed under GPL-3.0.
- Enterprise Use: Please contact us for commercial licensing options that may be better suited for enterprise environments.
- Research Use: The library is fully open for academic and non-commercial research applications.
See the LICENSE file for complete details.
This library is proudly sponsored and pioneered by Electi Consulting, a premier AI consultancy specializing in enterprise artificial intelligence, blockchain technology, and cryptographic solutions. The practical validation and enterprise-ready nature of these components has been made possible through Electi's extensive experience deploying state-of-the-art AI solutions across diverse industries:
- Financial Services: High-frequency trading, risk assessment, and fraud detection.
- Maritime Industry: Route optimization, predictive maintenance, and cargo management.
- Healthcare: Diagnostic assistance, treatment optimization, and clinical decision support.
- Manufacturing: Predictive maintenance, quality control, and supply chain optimization.
Special recognition is extended to the open-source community and the many researchers whose groundbreaking work forms the foundation of this library.
This library stands on the shoulders of giants. Our implementations are grounded in a rigorous study of the source papers that have defined the field of modern deep learning.
Transformers & Language Models
- Attention Is All You Need (Transformer): Vaswani, A., et al. (2017). NeurIPS.
- BERT: Pre-training of Deep Bidirectional Transformers: Devlin, J., et al. (2018). NAACL.
- RoFormer: Enhanced Transformer with Rotary Position Embedding: Su, J., et al. (2021). ACL.
- DIFFERENTIAL TRANSFORMER: Zhu, J., et al. (2025). CVPR.
- Mamba: Linear-Time Sequence Modeling: Gu, A., & Dao, T. (2023).
- Gemma 3 Technical Report: Google (2024).
- Qwen Technical Report: Bai, J., et al. (2023).
- Byte Latent Transformer: Patches Scale Better Than Tokens: Pagnoni, A., et al. (2024).
Vision & Multimodal Models
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT): Dosovitskiy, A., et al. (2020). ICLR.
- A ConvNet for the 2020s (ConvNeXt): Liu, Z., et al. (2022). CVPR.
- ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders: Woo, S., et al. (2023). CVPR.
- MobileNetV2: Inverted Residuals and Linear Bottlenecks: Sandler, M., et al. (2018). CVPR.
- Searching for MobileNetV3: Howard, A., et al. (2019). ICCV.
- MobileNetV4: Universal Inverted Bottleneck and Mobile MQA: Li, Y., et al. (2024).
- DINOv2: Learning Robust Visual Features without Supervision: Oquab, M., et al. (2023).
- Sigmoid Loss for Language Image Pre-Training (SigLIP): Zhai, X., et al. (2023). ICCV.
- Learning Transferable Visual Models From Natural Language Supervision (CLIP): Radford, A., et al. (2021). ICML.
- End-to-End Object Detection with Transformers (DETR): Carion, N., et al. (2020). ECCV.
- FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization: Vasu, P. K. A., et al. (2023).
Advanced Attention & FFN Mechanisms
- Modern Hopfield Networks is All You Need: Ramsauer, H., et al. (2020). ICML.
- GQA: Training Generalized Multi-Query Transformer Models: Ainslie, J., et al. (2023).
- Ring Attention with Blockwise Transformers for Near-Infinite Context: Liu, H., et al. (2024).
- Rethinking Attention with Performers: Choromanski, K., et al. (2020).
- FNet: Mixing Tokens with Fourier Transforms: Lee-Thorp, J., et al. (2021).
- GLU Variants Improve Transformer: Shazeer, N. (2020).
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer: Shazeer, N., et al. (2017).
- Pay Attention to MLPs (gMLP): Liu, H., et al. (2021). NeurIPS.
Graph Neural Networks & Special Architectures
- Graph Neural Networks: A Review: Wu, Z., et al. (2020).
- Graph Attention Networks: Veličković, P., et al. (2018). ICLR.
- Semi-Supervised Classification with Graph Convolutional Networks: Kipf, T. N., & Welling, M. (2016).
- Dynamic Routing Between Capsules: Sabour, S., et al. (2017). NeurIPS.
- KAN: Kolmogorov-Arnold Networks: Liu, Z., et al. (2024).
- FractalNet: Ultra-Deep Neural Networks without Residuals: Larsson, G., et al. (2016).
Time Series & Forecasting
- N-BEATS: Neural basis expansion analysis for interpretable time series forecasting: Oreshkin, B. N., et al. (2019). ICLR.
- Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting: Lim, B., et al. (2021).
- DeepAR: Probabilistic forecasting with autoregressive recurrent networks: Salinas, D., et al. (2020).
- Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift: Kim, T., et al. (2021). ICLR.
Loss Functions, Optimization, & Regularization
- AnyLoss: A General and Differentiable Framework for Classification Metric Optimization: Han, D., et al. (2024).
- Focal Loss for Dense Object Detection: Lin, T. Y., et al. (2017). ICCV.
- On calibration of modern neural networks: Guo, C., et al. (2017). ICML.
- Wasserstein GAN: Arjovsky, M., et al. (2017). ICML.
- Root Mean Square Layer Normalization: Zhang, B., & Sennrich, R. (2019). NeurIPS.
- Can We Gain More from Orthogonality Regularizations in Training Deep Networks?: Bansal, N., et al. (2018). NeurIPS.
- Predicting the Generalization Gap in Deep Networks with Margin Distributions: Martin, C., & Mahoney, M. W. (2019). ICLR.
Complete bibliographic information is available in the documentation for individual modules.
