michaelfeil

@basetenlabs | building infinity | M.Sc. ML/Robotics@TU-Munich

@basetenlabsSan Francisco

Pinned Repositories

CNC_Machining
data set for process monitoring on CNC machines
Language:Jupyter Notebook92 6 330
bigcode-evaluation-harness
A framework for the evaluation of autoregressive code generation language models.
Language:Python1 0 00
candle-flash-attn-v3
Language:C++10 1 22
cognita
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
Language:Python1 0 00
commonroad_motionplaner_michaelf
Winning 2020 solution for the commonroad.io contest
Language:Python13 2 21
embed
A stable, fast and easy-to-use inference library with a focus on a sync-to-async API
45 2 12
hf-hub-ctranslate2
Connecting Transformers on HuggingFace Hub with CTranslate2
Language:Python36 1 122
infinity
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
Language:Python2k 20 231129
iot_gateway_modbus
A MQTT Gateway connecting Modbus RTU and Google IoT Core
Language:Python5 1 05
skyjo_rl
Multi-Agent Reinforcement Learning Environment for the card game SkyJo, compatible with PettingZoo and RLLIB
Language:Jupyter Notebook13 1 10

michaelfeil's Repositories

michaelfeil/infinity
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
Language:Python2k 20 231129
michaelfeil/embed
A stable, fast and easy-to-use inference library with a focus on a sync-to-async API
45 2 12
michaelfeil/hf-hub-ctranslate2
Connecting Transformers on HuggingFace Hub with CTranslate2
Language:Python36 1 122
michaelfeil/skyjo_rl
Multi-Agent Reinforcement Learning Environment for the card game SkyJo, compatible with PettingZoo and RLLIB
Language:Jupyter Notebook13 1 10
michaelfeil/candle-flash-attn-v3
Language:C++10 1 22
michaelfeil/bigcode-evaluation-harness
A framework for the evaluation of autoregressive code generation language models.
Language:Python1 0 00
michaelfeil/cognita
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
Language:Python1 0 00
michaelfeil/flash-deberta
Deberta, but Flash
Language:Python1 1 00
michaelfeil/llama-recipes
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
Language:Jupyter Notebook1 0 00
michaelfeil/academicpages
my personal website
Language:JavaScript1 0
michaelfeil/BentoInfinity
michaelfeil/candle
Minimalist ML framework for Rust
Language:Rust0 0
michaelfeil/candle-cublaslt
Language:Rust0 0
michaelfeil/datachain
DataChain 🔗 Process and curate unstructured data using local ML models and LLM calls
michaelfeil/fastembed
Fast, Accurate, Lightweight Python library to make State of the Art Embedding
Language:Python0 0
michaelfeil/hf_transfer
michaelfeil/JamAIBase
The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
Language:Python0 0
michaelfeil/kubeai
Private Open AI on Kubernetes
Language:Go0 0
michaelfeil/pylabrobot
An interactive & hardware agnostic interface for lab automation
Language:Python0 0
michaelfeil/qdrant
Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
Language:Rust0 0
michaelfeil/qdrant-client
Python client for Qdrant vector search engine
michaelfeil/samba-qa
Production RAG Based on API Controllers
Language:Python0 0
michaelfeil/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python0 0
michaelfeil/start-rag
michaelfeil/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
michaelfeil/TensorRT-Model-Optimizer
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
michaelfeil/text-embeddings-inference
A blazing fast inference solution for text embeddings models
michaelfeil/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
michaelfeil/triton
Development repository for the Triton language and compiler
Language:C++0 0
michaelfeil/zerox
Zero shot pdf OCR with gpt-4o-mini
Language:Python0 0