Pinned Repositories
exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
fastmoe
A fast MoE impl for PyTorch
FlexFlow
A distributed deep learning framework that supports flexible parallelization strategies.
MI-for-NAS
Mutual information for fine-grained network analysis in neural architecture search
MMST-spelling-correction
A novel, context-sensitive spelling corrector that uses clusterings in GloVe embeddings as a learned notion of context.
video-etl
Supplementary material to paper "Extract-Transform-Load for Video Streams"
FlexFlow
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
lab0
Go tutorial for the MIT DB class
brad
A virtualization layer for cloud data infrastructures.
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
ferdiko's Repositories
ferdiko/video-etl
Supplementary material to paper "Extract-Transform-Load for Video Streams"
ferdiko/exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
ferdiko/fastmoe
A fast MoE impl for PyTorch
ferdiko/FlexFlow
A distributed deep learning framework that supports flexible parallelization strategies.
ferdiko/MI-for-NAS
Mutual information for fine-grained network analysis in neural architecture search
ferdiko/MMST-spelling-correction
A novel, context-sensitive spelling corrector that uses clusterings in GloVe embeddings as a learned notion of context.