0324wy

Boston,USA

Pinned Repositories

0324wy.github.io
Language:HTML0 1 00
annotated-GPT
A simple GPT model with 300 lines of code
Language:Jupyter Notebook0 1 00
ChatGPT-Web
Web server of chatGPT. You can use it to build your web application.
Language:Python1 1 00
chatroom
Three different versions of the multiplayer chatroom based on BIO/NIO/AIO
Language:Java0 1 00
cudaProgramming
Language:Cuda00
JavaNotes
Java面试题及答案
6 1 00
lox
An interpreter
Language:Java00
mdxParser
A Dictionary Parser for mdx Format
Language:HTML1 1 00
needle
A deep learning library, comparable to a very minimal version of PyTorch or TensorFlow
Language:Python0 1 00
zhihu-spider
A web crawler script for crawling all answers to one question on Zhihu
Language:Python5 1 00

0324wy's Repositories

0324wy/zhihu-spider
A web crawler script for crawling all answers to one question on Zhihu
Language:Python5 1 00
0324wy/ChatGPT-Web
Web server of chatGPT. You can use it to build your web application.
Language:Python1 1 00
0324wy/mdxParser
A Dictionary Parser for mdx Format
Language:HTML1 1 00
0324wy/0324wy.github.io
Language:HTML0 1 00
0324wy/annotated-GPT
A simple GPT model with 300 lines of code
Language:Jupyter Notebook0 1 00
0324wy/ChatGPT-Server
Lightweight package for interacting with ChatGPT's API by OpenAI. Uses reverse engineered official API.
Language:Python0 0 00
0324wy/CPlusPlus4OtherLanguage
If you are familar with Python or Java, but not C++, this notes help you learn C++.
Language:Makefile0 1 00
0324wy/cprs-solution
00
0324wy/cudaProgramming
Language:Cuda00
0324wy/lox
An interpreter
Language:Java00
0324wy/needle
A deep learning library, comparable to a very minimal version of PyTorch or TensorFlow
Language:Python0 1 00
0324wy/AgentCoder
This Repo is the official implementation of AgentCoder and AgentCoder+.
Language:Python
0324wy/babyTriton
Language:Python
0324wy/CrypTen
A framework for Privacy Preserving Machine Learning
Language:Python
0324wy/cutlass
CUDA Templates for Linear Algebra Subroutines
0324wy/FlashAttention20Triton
Triton implementation of Flash Attention2.0
0324wy/flashinfer
FlashInfer: Kernel Library for LLM Serving
0324wy/llama
Inference code for Llama models
0324wy/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
0324wy/LLMSpeculativeSampling
Fast inference from large lauguage models via speculative decoding
Language:Python
0324wy/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
0324wy/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
0324wy/LVM
0324wy/MapCoder
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
0324wy/newBlock
Language:HTML1 0
0324wy/piranha
Piranha: A GPU Platform for Secure Computation
0324wy/S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
0324wy/sglang
SGLang is a fast serving framework for large language models and vision language models.
0324wy/tenset
0324wy/tiny-flash-attention
flash attention tutorial written in python, triton, cuda, cutlass