scv119

Building Ray@Anyscale, formerly@Facebook, co-creator of Delos

AnyscaleUnited States

Pinned Repositories

ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Language:Python34.9k 479 19.2k5.9k
.tmux
🇫🇷 Oh my tmux! My self-contained, pretty & versatile tmux configuration made with ❤️
1 2 00
6.828
Homework for https://pdos.csail.mit.edu/6.828
Language:C1 3 00
awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
1 1 00
clucene
my fork of clucene
Language:C++1 3 00
codebase
my code base
Language:Java2 3 00
db-readings
Readings in Databases
1 2 00
dmv
Geasemonkey script running in Firefox who automatically make a behind the wheel test appointment
Language:JavaScript1 3 01
interview
Language:Java9 7 02
ray
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
Language:Python2 2 00

scv119's Repositories

scv119/ray
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
Language:Python2 2 00
scv119/awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
1 1 00
scv119/openmlsys-zh
《Machine Learning Systems: Design and Implementation》- Chinese Version
Language:TeX1 1 0
scv119/punica
Language:Cuda1 1 0
scv119/CUDA-PPT
scv119/cutlass-kernels
scv119/FasterTransformer
Transformer related optimization, including BERT, GPT
Language:C++1 0
scv119/flash-attention
Fast and memory-efficient exact attention
Language:Python1 01
scv119/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda1 0
scv119/grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
Language:Cuda1 0
scv119/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
Language:Cuda1 0
scv119/learn-rust
Language:Rust2 0
scv119/learning-nn
Language:Jupyter Notebook2 0
scv119/learning-triton
Language:Python2 0
scv119/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Language:Python1 0
scv119/Lightrails
Yet another distributed training/inferencing framework.
2 0
scv119/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Language:C++
scv119/megablocks
Language:Python1 0
scv119/Megatron-LM
Ongoing research training transformer models at scale
Language:Python1 0
scv119/mini-redis
Incomplete Redis client and server implementation using Tokio - for learning purposes only
Language:Rust0 0
scv119/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Language:Python1 0
scv119/og-equity-compensation
Stock options, RSUs, taxes — read the latest edition: www.holloway.com/ec
1 0
scv119/open_llama
scv119/orbit
A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.
Language:Python2 0
scv119/r4cppp
Rust for C++ programmers
Language:Rust1 0
scv119/ScaleLLM
A high-performance inference system for large language models, designed for production environments.
Language:C++1 0
scv119/scv119
2 0
scv119/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++1 0
scv119/The-Art-of-Linear-Algebra
Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"
Language:PostScript1 0
scv119/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python1 0