EmbeddedLLM
EmbeddedLLM is the creator behind JamAI Base, a platform designed to orchestrate AI with spreadsheet-like simplicity.
Singapore
Pinned Repositories
embeddedllm
EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU
flash-attention-docker
This is a repository that contains a CI/CD that will try to compile docker images that already built flash attention into the image to facilitate quicker development and deployment of other frameworks.
JamAIBase
The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
jamaibase-cookbook
JamAI Base cookbook repo
jamaibase-nextjs-vercel
LLaVA-Plus-Serve
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
mamba-rocm
unstructured-executable
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
xformers-rocm
Strip down to support flash attention v2 ROCM.
EmbeddedLLM's Repositories
EmbeddedLLM/JamAIBase
The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
EmbeddedLLM/vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
EmbeddedLLM/embeddedllm
EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU
EmbeddedLLM/jamaibase-cookbook
JamAI Base cookbook repo
EmbeddedLLM/flash-attention-docker
This is a repository that contains a CI/CD that will try to compile docker images that already built flash attention into the image to facilitate quicker development and deployment of other frameworks.
EmbeddedLLM/unstructured-executable
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
EmbeddedLLM/jamaibase-ts-docs
Typescript Documentation of JamAISDK
EmbeddedLLM/workshop
EmbeddedLLM/ai-town
A MIT-licensed, deployable starter kit for building and customizing your own version of AI town - a virtual town where AI characters live, chat and socialize.
EmbeddedLLM/arena-hard-auto
Arena-Hard-Auto: An automatic LLM benchmark.
EmbeddedLLM/axolotl-amd
Go ahead and axolotl questions
EmbeddedLLM/etalon
LLM Serving Performance Evaluation Harness
EmbeddedLLM/flash-attention-rocm
ROCm Fork of Fast and memory-efficient exact attention (The idea of this branch is to hope to generate flash attention pypi package to be readily installed and used.
EmbeddedLLM/github-bot
EmbeddedLLM/infinity-executable
Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
EmbeddedLLM/jamaibase-expressjs-vercel
EmbeddedLLM/Liger-Kernel
Efficient Triton Kernels for LLM Training
EmbeddedLLM/LLM_Sizing_Guide
A calculator to estimate the memory footprint, capacity, and latency on NVIDIA AMD Intel
EmbeddedLLM/LMCache
ROCm support of Ultra-Fast and Cheaper Long-Context LLM Inference
EmbeddedLLM/lmcache-tests
EmbeddedLLM/lmcache-vllm
The driver for LMCache core to run in vLLM
EmbeddedLLM/PowerToys
Windows system utilities to maximize productivity
EmbeddedLLM/SageAttention-rocm
ROCm Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
EmbeddedLLM/skypilot
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
EmbeddedLLM/Star-Attention
Efficient LLM Inference over Long Sequences
EmbeddedLLM/torchac_rocm
ROCm Implementation of torchac_cuda from LMCache
EmbeddedLLM/unstructured-api-executable
EmbeddedLLM/unstructured-inference-executable
EmbeddedLLM/unstructured-python-client
A Python client for the Unstructured hosted API
EmbeddedLLM/vllm-rocmfork
A high-throughput and memory-efficient inference and serving engine for LLMs