jack-pan-ai's Stars
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
facebookresearch/faiss
A library for efficient similarity search and clustering of dense vectors.
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
NVIDIA/DeepLearningExamples
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
EleutherAI/gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Stability-AI/StableCascade
Official Code for Stable Cascade
NeoVertex1/SuperPrompt
SuperPrompt is an attempt to engineer prompts that might help us understand AI agents.
kokkos/kokkos
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
microsoft/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
openai/blocksparse
Efficient GPU kernels for block-sparse matrix multiplication and convolution
Liu-xiandong/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
src-d/kmcuda
Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
guoshnBJTU/ASTGCN-2019-pytorch
Attention Based Spatial-Temporal Graph Convolutional Networks for Traffic Flow Forecasting, AAAI 2019, pytorch version
kakao/n2
TOROS N2 - lightweight approximate Nearest Neighbor library which runs fast even with large datasets
js05212/BayesianDeepLearning-Survey
Bayesian Deep Learning: A Survey
NVIDIA/modulus-makani
Massively parallel training of machine-learning based weather and climate models
ecrc/kblas-gpu
Subset of BLAS routines optimized for NVIDIA GPUs
suco-gt/HPC-Internships
Supercomputing @ GT has compiled a list of organizations that offer internships and experiences in HPC and applications of HPC.
tulerfeng/Awesome-Embodied-Multimodal-LLMs
Latest Advances on Embodied Multimodal LLMs (or Vison-Language-Action Models).
davidruegamer/FDA_tutorial
TheCoreTeam/core_scheduler
CoreScheduler: A High-Performance Scheduler for Large Model Training
zuochunwei/hpc
ecrc/ExaGeoStatCPP
DragosTana/kmeans
KMeans algorithmn parallelized with OpenMP
hpc-io/aiio
paper-code1/BV-Gaussian
suco-gt/HPC-Student-Resources
Student resources and opportunities in HPC!