Pinned Repositories
Apple-M1-BERT
3X speedup over Apple’s TensorFlow plugin by using Apache TVM on M1
deformable-attention-kernel
TVMScript kernel for deformable attention
macho-dyld
Custom dyld version inherited from original Apple dyld implementation
octoai-textgen-cookbook
Simple getting-started code examples for LLM applications powered by OctoAI
octocloud-templates
octoml-llm-qa
A code sample that shows how to use 🦜️🔗langchain, 🦙llama_index and a hosted LLM endpoint to do a standard chat or Q&A about a pdf document
octoml-profile
Home for OctoML PyTorch Profiler
synr
A library for syntactically rewriting Python programs, pronounced (sinner).
triton-client-rs
A client library in Rust for Nvidia Triton.
tvm2onnx
An open-source tool created by OctoML that converts TVM-optimized models to code runnable in ONNX Runtime.
OctoAI's Repositories
octoml/octoai-textgen-cookbook
Simple getting-started code examples for LLM applications powered by OctoAI
octoml/macho-dyld
Custom dyld version inherited from original Apple dyld implementation
octoml/mlc-llm
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
octoml/octoai-apps
A collection of OctoAI-based demos.
octoml/fern-config
Configuration for generating SDKs and Documentation.
octoml/flashinfer
FlashInfer: Kernel Library for LLM Serving
octoml/pre-commit-kustomize
pre-commit hook which runs kustomize docker image (use with https://github.com/pre-commit/pre-commit)
octoml/.github
octoml/EAGLE
OctoML Implementation of EAGLE-1 and EAGLE-2
octoml/llama-recipes
Examples and recipes for Llama 2 model
octoml/vllm-project
A high-throughput and memory-efficient inference and serving engine for LLMs
octoml/langchain
⚡ Building applications with LLMs through composability ⚡
octoml/demo-design-system
octoml/docker_auth
Authentication server for Docker Registry 2
octoml/go-jose
An implementation of JOSE standards (JWE, JWS, JWT) in Go
octoml/go-oidc
A Go OpenID Connect client.
octoml/homebrew-tap
Homebrew Tap of OctoML products and tools.
octoml/lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
octoml/msi-fe
octoml/multicloud-asset-code-review-example
Multicloud Asset Code Review Public Repo example.
octoml/octo-bots-mirror
octoml/octoai-model-examples
A set of models you can build and deploy on octoai
octoml/octoai-solutions
A collection of reference solutions built on top of OctoAI SaaS
octoml/photobooth-bg-gen
octoml/pinecone-rag-demo
Pinecone + Vercel RAG application, showcasing a comparison between chat with no context and using a Pinecone index for context
octoml/RULER
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
octoml/sagemaker-examples
octoml/TensorRT-LLM-release
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
octoml/tflint-ruleset-google
TFLint ruleset for terraform-provider-google
octoml/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.