EmbeddedLLM

EmbeddedLLM is the creator behind JamAI Base, a platform designed to orchestrate AI with spreadsheet-like simplicity.

Singapore

Pinned Repositories

embeddedllm
EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU
Language:Python26 3 180
flash-attention-docker
This is a repository that contains a CI/CD that will try to compile docker images that already built flash attention into the image to facilitate quicker development and deployment of other frameworks.
Language:Shell10
JamAIBase
The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
Language:Python621 5 919
jamaibase-cookbook
JamAI Base cookbook repo
Language:Python4 2 00
jamaibase-nextjs-vercel
Language:TypeScript10
LLaVA-Plus-Serve
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
Language:Python1 0 00
mamba-rocm
Language:Python5 1 01
unstructured-executable
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Language:HTML1 0 00
vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python89 2 135
xformers-rocm
Strip down to support flash attention v2 ROCM.
Language:Python4 0 01

EmbeddedLLM's Repositories

EmbeddedLLM/JamAIBase
The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
Language:Python621 5 919
EmbeddedLLM/vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python89 2 135
EmbeddedLLM/embeddedllm
EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU
Language:Python26 3 180
EmbeddedLLM/jamaibase-cookbook
JamAI Base cookbook repo
Language:Python4 2 00
EmbeddedLLM/flash-attention-docker
This is a repository that contains a CI/CD that will try to compile docker images that already built flash attention into the image to facilitate quicker development and deployment of other frameworks.
Language:Shell10
EmbeddedLLM/unstructured-executable
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Language:HTML1 0 00
EmbeddedLLM/jamaibase-ts-docs
Typescript Documentation of JamAISDK
Language:HTML00
EmbeddedLLM/workshop
Language:Jupyter Notebook0 2 00
EmbeddedLLM/ai-town
A MIT-licensed, deployable starter kit for building and customizing your own version of AI town - a virtual town where AI characters live, chat and socialize.
Language:TypeScript0 0
EmbeddedLLM/arena-hard-auto
Arena-Hard-Auto: An automatic LLM benchmark.
Language:Jupyter Notebook0 0
EmbeddedLLM/axolotl-amd
Go ahead and axolotl questions
EmbeddedLLM/etalon
LLM Serving Performance Evaluation Harness
EmbeddedLLM/flash-attention-rocm
ROCm Fork of Fast and memory-efficient exact attention (The idea of this branch is to hope to generate flash attention pypi package to be readily installed and used.
EmbeddedLLM/github-bot
Language:Go0 0
EmbeddedLLM/infinity-executable
Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
Language:Python0 0
EmbeddedLLM/jamaibase-expressjs-vercel
Language:TypeScript1
EmbeddedLLM/Liger-Kernel
Efficient Triton Kernels for LLM Training
Language:Python
EmbeddedLLM/LLM_Sizing_Guide
A calculator to estimate the memory footprint, capacity, and latency on NVIDIA AMD Intel
Language:Python
EmbeddedLLM/LMCache
ROCm support of Ultra-Fast and Cheaper Long-Context LLM Inference
Language:Python
EmbeddedLLM/lmcache-tests
Language:Python
EmbeddedLLM/lmcache-vllm
The driver for LMCache core to run in vLLM
EmbeddedLLM/PowerToys
Windows system utilities to maximize productivity
Language:C#0 0
EmbeddedLLM/SageAttention-rocm
ROCm Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
EmbeddedLLM/skypilot
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
EmbeddedLLM/Star-Attention
Efficient LLM Inference over Long Sequences
EmbeddedLLM/torchac_rocm
ROCm Implementation of torchac_cuda from LMCache
EmbeddedLLM/unstructured-api-executable
Language:Python0 0
EmbeddedLLM/unstructured-inference-executable
EmbeddedLLM/unstructured-python-client
A Python client for the Unstructured hosted API
EmbeddedLLM/vllm-rocmfork
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python