Real-bojack's Stars
Marker-Inc-Korea/AutoRAG
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
percent4/embedding_rerank_retrieval
本项目是针对RAG中的Retrieve阶段的召回技术及算法效果所做评估实验。使用主体框架为LlamaIndex.
AnswerDotAI/rerankers
A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.
HeKun-NVIDIA/CUDA-Programming-Guide-in-Chinese
This is a Chinese translation of the CUDA programming guide
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
FlagOpen/FlagEmbedding
Retrieval and Retrieval-augmented LLMs
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
xgqdut2016/cuda_code
easy cuda code
xlite-dev/CUDA-Learn-Notes
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
nvixnu/pmpp__programming_massively_parallel_processors
Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (Third Edition)
NVIDIA/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
gpu-mode/lectures
Material for gpu-mode lectures
facebookresearch/mae
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
abetlen/llama-cpp-python
Python bindings for llama.cpp
kwai/Megatron-Kwai
[USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Parallelism
deepspeedai/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
deepspeedai/DeepSpeedExamples
Example models using DeepSpeed
infiniflow/ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
labring/FastGPT
FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration.
xorbitsai/inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
ollama/ollama
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, and other large language models.
NVIDIA/cudnn-frontend
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
tingshua-yts/BetterDL
pytorch/benchmark
TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.
Oneflow-Inc/oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
run-llama/llama_index
LlamaIndex is the leading framework for building LLM-powered agents over your data.
microsoft/LLMLingua
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
gomate-community/TrustRAG
TrustRAG:The RAG Framework within Reliable input,Trusted output