Pinned Repositories
accelerate
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
accelerate-deepspeed-test
Testing DeepSpeed integration in 🤗 Accelerate
annotated-transformer
An annotated implementation of the Transformer paper.
autotvm_tutorial
autoTVM神经网络推理代码优化搜索演示,基于tvm编译开源模型centerface,并使用autoTVM搜索最优推理代码, 最终部署编译为c++代码,演示平台是cuda,可以是其他平台,例如树莓派,安卓手机,苹果手机.Thi is a demonstration of how to use autoTVM to search and optimize a neural network inference code. the main process of this program is , firstly use tvm to compile opensource model centerface , then use autotvm to auto search the best in
bayesian-optimization-in-action
Source code for Bayesian Optimization in Action, published by Manning
bitsandbytes
8-bit CUDA functions for PyTorch
builder
Continuous builder and binary build scripts for pytorch
ChatGLM-6B
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
ChatGLM-RLHF-LoRA-RM-PPO
ChatGLM-6B添加了RLHF的实现,以及部分核心代码的逐行讲解 ,实例部分是做了个新闻短标题的生成,以及指定context推荐的RLHF的实现
ChatGLM2-6B
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
SeekPoint's Repositories
SeekPoint/ColossalAI
Making large AI models cheaper, faster and more accessible
SeekPoint/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
SeekPoint/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
SeekPoint/DeepSpeedExamples
Example models using DeepSpeed
SeekPoint/Chinese-LLaMA-Alpaca
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
SeekPoint/cuda-training-series
Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
SeekPoint/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
SeekPoint/FasterTransformer
Transformer related optimization, including BERT, GPT
SeekPoint/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
SeekPoint/HugeCTR
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
SeekPoint/Liger-Kernel
Efficient Triton Kernels for LLM Training
SeekPoint/llama.cpp
Port of Facebook's LLaMA model in C/C++
SeekPoint/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
SeekPoint/Megatron-LM
Ongoing research training transformer models at scale
SeekPoint/mistral-src
Reference implementation of Mistral AI 7B v0.1 model.
SeekPoint/nccl
Optimized primitives for collective multi-GPU communication
SeekPoint/numpy-ml
Machine learning, in numpy
SeekPoint/open-gpu-kernel-modules
NVIDIA Linux open GPU kernel module source
SeekPoint/Pai-Megatron-Patch
SeekPoint/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
SeekPoint/rfcs
PyTorch RFCs (experimental)
SeekPoint/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
SeekPoint/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
SeekPoint/TensorRT_Tutorial
SeekPoint/text_classfication-with-bert-pytorch
nlp text classification task with bert and pytorch on IMDB dataset
SeekPoint/transformer-debugger
SeekPoint/triton
Development repository for the Triton language and compiler
SeekPoint/trt-samples-for-hackathon-cn
Simple samples for TensorRT programming
SeekPoint/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
SeekPoint/wxhelper
Hook WeChat / 微信逆向