Pinned Repositories
C-Plus-Plus
Collection of various algorithms in mathematics, machine learning, computer science and physics implemented in C++ for educational purposes.
cuda-course
cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Data-engineering-projects
Efficient-Computing
Efficient computing methods developed by Huawei Noah's Ark Lab
icefall
onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
sherpa-onnx
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift
triton
Development repository for the Triton language and compiler
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
manickavela29's Repositories
manickavela29/Efficient-Computing
Efficient computing methods developed by Huawei Noah's Ark Lab
manickavela29/icefall
manickavela29/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
manickavela29/sherpa-onnx
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift
manickavela29/triton
Development repository for the Triton language and compiler
manickavela29/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
manickavela29/C-Plus-Plus
Collection of various algorithms in mathematics, machine learning, computer science and physics implemented in C++ for educational purposes.
manickavela29/cuda-course
manickavela29/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
manickavela29/Data-engineering-projects
manickavela29/EmoTwitter
OnnxRT based Inference Optimization of Roberta model trained for Sentiment Analysis On Twitter Dataset
manickavela29/flash-attention
Fast and memory-efficient exact attention
manickavela29/IBM-Hackchalllenge-Winner
Won IBM Hackchallenge 2020 Jury's Choice Award
manickavela29/Masters-Course-R-Python-Machine-Learning-Stats
Master's Assignment and Course works
manickavela29/sequitur-g2p
This is a github repository of the abandonware Sequitur G2P by Bisani & Ney
manickavela29/deploy-learn
Learning and nuances for docker and kubernetes deployements
manickavela29/GLiNER
Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
manickavela29/hpc-learn
manickavela29/lectures
Material for cuda-mode lectures
manickavela29/llama.cpp
LLM inference in C/C++
manickavela29/llm-merging.github.io
manickavela29/optimum-nvidia
manickavela29/perftime_tools
Comparing tools used for performance metrics and validating their consistency
manickavela29/QLLM
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.
manickavela29/tensorrt-cpp-api
TensorRT C++ API Tutorial
manickavela29/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
manickavela29/zip-optim
Optimizing zipformer, Transducer model for inference