manickavela29

MLOPs at Uniphore | Performant ML Systems

UniphoreIndia

Pinned Repositories

C-Plus-Plus
Collection of various algorithms in mathematics, machine learning, computer science and physics implemented in C++ for educational purposes.
Language:C++00
cuda-course
Language:Cuda00
cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Language:C00
Data-engineering-projects
Language:Jupyter Notebook0 1 00
Efficient-Computing
Efficient computing methods developed by Huawei Noah's Ark Lab
Language:Jupyter Notebook1 0 00
icefall
Language:Python1 0 00
onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Language:C++1 0 00
sherpa-onnx
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift
Language:C++1 0 00
triton
Development repository for the Triton language and compiler
Language:C++1 0 00
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python1 0 00

manickavela29's Repositories

manickavela29/Efficient-Computing
Efficient computing methods developed by Huawei Noah's Ark Lab
Language:Jupyter Notebook1 0 00
manickavela29/icefall
Language:Python1 0 00
manickavela29/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Language:C++1 0 00
manickavela29/sherpa-onnx
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift
Language:C++1 0 00
manickavela29/triton
Development repository for the Triton language and compiler
Language:C++1 0 00
manickavela29/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python1 0 00
manickavela29/C-Plus-Plus
Collection of various algorithms in mathematics, machine learning, computer science and physics implemented in C++ for educational purposes.
Language:C++00
manickavela29/cuda-course
Language:Cuda00
manickavela29/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Language:C00
manickavela29/Data-engineering-projects
Language:Jupyter Notebook0 1 00
manickavela29/EmoTwitter
OnnxRT based Inference Optimization of Roberta model trained for Sentiment Analysis On Twitter Dataset
Language:Jupyter Notebook0 1 00
manickavela29/flash-attention
Fast and memory-efficient exact attention
Language:Python0 0 00
manickavela29/IBM-Hackchalllenge-Winner
Won IBM Hackchallenge 2020 Jury's Choice Award
0 1 00
manickavela29/Masters-Course-R-Python-Machine-Learning-Stats
Master's Assignment and Course works
Language:Java0 1 00
manickavela29/sequitur-g2p
This is a github repository of the abandonware Sequitur G2P by Bisani & Ney
Language:Python0 0 00
manickavela29/deploy-learn
Learning and nuances for docker and kubernetes deployements
Language:Dockerfile
manickavela29/GLiNER
Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
manickavela29/hpc-learn
Language:Cuda
manickavela29/lectures
Material for cuda-mode lectures
manickavela29/llama.cpp
LLM inference in C/C++
Language:C++0 0
manickavela29/llm-merging.github.io
manickavela29/optimum-nvidia
Language:Python0 0
manickavela29/perftime_tools
Comparing tools used for performance metrics and validating their consistency
Language:C++
manickavela29/QLLM
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.
manickavela29/tensorrt-cpp-api
TensorRT C++ API Tutorial
manickavela29/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++0 0
manickavela29/zip-optim
Optimizing zipformer, Transducer model for inference
Language:Python

manickavela29

Pinned Repositories

C-Plus-Plus

cuda-course

cuda-samples

Data-engineering-projects

Efficient-Computing

icefall

onnxruntime

sherpa-onnx

triton

vllm

manickavela29's Repositories

manickavela29/Efficient-Computing

manickavela29/icefall

manickavela29/onnxruntime

manickavela29/sherpa-onnx

manickavela29/triton

manickavela29/vllm

manickavela29/C-Plus-Plus

manickavela29/cuda-course

manickavela29/cuda-samples

manickavela29/Data-engineering-projects

manickavela29/EmoTwitter

manickavela29/flash-attention

manickavela29/IBM-Hackchalllenge-Winner

manickavela29/Masters-Course-R-Python-Machine-Learning-Stats

manickavela29/sequitur-g2p

manickavela29/deploy-learn

manickavela29/GLiNER

manickavela29/hpc-learn

manickavela29/lectures

manickavela29/llama.cpp

manickavela29/llm-merging.github.io

manickavela29/optimum-nvidia

manickavela29/perftime_tools

manickavela29/QLLM

manickavela29/tensorrt-cpp-api

manickavela29/TensorRT-LLM

manickavela29/zip-optim