Pinned Repositories
ai-science-training-series
ai-training-series
AI Training Series Material
Auto-GPT
An experimental open-source attempt to make GPT-4 fully autonomous.
booklet_wamta_2025
casting_on_device
ClimaAtmos.jl
ClimaAtmos.jl is a library for building atmospheric circulation models that is designed from the outset to leverage data assimilation and machine learning tools. We welcome contributions!
CUDALibrarySamples
CUDA Library Samples
dplasma
DPLASMA is the fastest implementation of a dense linear algebra package for distributed heterogeneous systems. It is designed to deliver sustained performance for distributed systems where each node featuring multiple sockets of multicore processors, and if available, accelerators, using the PaRSEC runtime as a backend.
FlexFlow
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
free-programming-books
:books: Freely available programming books
QingleiCao's Repositories
QingleiCao/ai-science-training-series
QingleiCao/ai-training-series
AI Training Series Material
QingleiCao/Auto-GPT
An experimental open-source attempt to make GPT-4 fully autonomous.
QingleiCao/booklet_wamta_2025
QingleiCao/casting_on_device
QingleiCao/ClimaAtmos.jl
ClimaAtmos.jl is a library for building atmospheric circulation models that is designed from the outset to leverage data assimilation and machine learning tools. We welcome contributions!
QingleiCao/CUDALibrarySamples
CUDA Library Samples
QingleiCao/dplasma
DPLASMA is the fastest implementation of a dense linear algebra package for distributed heterogeneous systems. It is designed to deliver sustained performance for distributed systems where each node featuring multiple sockets of multicore processors, and if available, accelerators, using the PaRSEC runtime as a backend.
QingleiCao/FlexFlow
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
QingleiCao/free-programming-books
:books: Freely available programming books
QingleiCao/GPTune
QingleiCao/h5bench
A benchmark suite for measuring HDF5 performance.
QingleiCao/hip-training-series
Repository with examples and exercises for OLCF and AMD's HIP training series
QingleiCao/latent-diffusion
High-Resolution Image Synthesis with Latent Diffusion Models
QingleiCao/learning_note_qinglei
QingleiCao/llm_tutorial
llm_tutorials
QingleiCao/LLMsPracticalGuide
A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)
QingleiCao/mit-deep-learning-book-pdf
MIT Deep Learning Book in PDF format (complete and parts) by Ian Goodfellow, Yoshua Bengio and Aaron Courville
QingleiCao/mpi4py
Python bindings for MPI
QingleiCao/nsf-proposal-latex-samples
LaTeX samples for NSF Research.gov Proposal Submission. For more information about Research.gov Proposal Submission visit https://www.research.gov/research-web/content/aboutpsm Feedback syee@nsf.gov
QingleiCao/parsec
PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core heterogeneous architectures. PaRSEC assigns computation threads to the cores, GPU accelerators, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on architectural features such as NUMA nodes and algorithmic features such as data reuse.
QingleiCao/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
QingleiCao/pytorch-deep-learning
Materials for the Learn PyTorch for Deep Learning: Zero to Mastery course.
QingleiCao/QingleiCao.github.io
QingleiCao/roop_face
one-click deepfake (face swap)
QingleiCao/sort-google-scholar
Sorting Google Scholar search results based on the number of citations
QingleiCao/stable-diffusion
A latent text-to-image diffusion model
QingleiCao/task-bench
A task benchmark
QingleiCao/ttg
TTG: Template Task Graph C++ API
QingleiCao/wamta25.github.io