Pinned Repositories
CNC_Machining
data set for process monitoring on CNC machines
bigcode-evaluation-harness
A framework for the evaluation of autoregressive code generation language models.
candle-flash-attn-v3
cognita
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
commonroad_motionplaner_michaelf
Winning 2020 solution for the commonroad.io contest
embed
A stable, fast and easy-to-use inference library with a focus on a sync-to-async API
hf-hub-ctranslate2
Connecting Transformers on HuggingFace Hub with CTranslate2
infinity
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
iot_gateway_modbus
A MQTT Gateway connecting Modbus RTU and Google IoT Core
skyjo_rl
Multi-Agent Reinforcement Learning Environment for the card game SkyJo, compatible with PettingZoo and RLLIB
michaelfeil's Repositories
michaelfeil/infinity
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
michaelfeil/embed
A stable, fast and easy-to-use inference library with a focus on a sync-to-async API
michaelfeil/hf-hub-ctranslate2
Connecting Transformers on HuggingFace Hub with CTranslate2
michaelfeil/skyjo_rl
Multi-Agent Reinforcement Learning Environment for the card game SkyJo, compatible with PettingZoo and RLLIB
michaelfeil/candle-flash-attn-v3
michaelfeil/bigcode-evaluation-harness
A framework for the evaluation of autoregressive code generation language models.
michaelfeil/cognita
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
michaelfeil/flash-deberta
Deberta, but Flash
michaelfeil/llama-recipes
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
michaelfeil/academicpages
my personal website
michaelfeil/BentoInfinity
michaelfeil/candle
Minimalist ML framework for Rust
michaelfeil/candle-cublaslt
michaelfeil/datachain
DataChain 🔗 Process and curate unstructured data using local ML models and LLM calls
michaelfeil/fastembed
Fast, Accurate, Lightweight Python library to make State of the Art Embedding
michaelfeil/hf_transfer
michaelfeil/JamAIBase
The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
michaelfeil/kubeai
Private Open AI on Kubernetes
michaelfeil/pylabrobot
An interactive & hardware agnostic interface for lab automation
michaelfeil/qdrant
Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
michaelfeil/qdrant-client
Python client for Qdrant vector search engine
michaelfeil/samba-qa
Production RAG Based on API Controllers
michaelfeil/sglang
SGLang is a fast serving framework for large language models and vision language models.
michaelfeil/start-rag
michaelfeil/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
michaelfeil/TensorRT-Model-Optimizer
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
michaelfeil/text-embeddings-inference
A blazing fast inference solution for text embeddings models
michaelfeil/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
michaelfeil/triton
Development repository for the Triton language and compiler
michaelfeil/zerox
Zero shot pdf OCR with gpt-4o-mini