lcy-seso
MSR Asia. Previously worked at Baidu IDL(Institution of Deep Learning) and contributed as a member of the Paddle team.
MSRA, system research groupChina
Pinned Repositories
DLFrameworkTest
My tests and experiments with some popular dl frameworks.
EfficientAttention-Notes
FractalTensor
FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of lists of statically-shaped tensors, referred to as a FractalTensor.
lcy-seso.github.io
Ying's blog posts.
LearnHaskell
So I decide to learn a functional programming language.
LearningNotes
Ying's notes
models
Model configureations
paddle_confs_v1
paddle configuration files written by old API.
TileFusion
TileFusion is a highly efficient kernel template library designed to elevate the level of abstraction in CUDA C for processing tiles.
VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
lcy-seso's Repositories
lcy-seso/LearnHaskell
So I decide to learn a functional programming language.
lcy-seso/JuliaMachineLearning
Small exercise of some machine learning algorithms using the Julia programming language.
lcy-seso/pypoly
Extract polyhedral representation from PyTorch programs.
lcy-seso/JuliaLearningNotes
My learning notes of the Julia programming language.
lcy-seso/awesome-fast-attention
list of efficient attention modules
lcy-seso/batched_gemm
lcy-seso/coding-interview-university
A complete computer science study plan to become a software engineer.
lcy-seso/CUDAMemoryPool
lcy-seso/DeepBench
Benchmarking Deep Learning operations on different hardware
lcy-seso/experiments
lcy-seso/instaparse
lcy-seso/isl
Integer Set Library (source repository: http://repo.or.cz/w/isl.git)
lcy-seso/jax
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
lcy-seso/myia
Myia prototyping
lcy-seso/onnx-simplifier
Simplify your onnx model
lcy-seso/Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
lcy-seso/pbatch
lcy-seso/pet
Polyhedral Extraction Tool (source repository: http://repo.or.cz/w/pet.git)
lcy-seso/play-with-torch-script
Play with torch script.
lcy-seso/ppcg
Polyhedral Parallel Code Generation (source repository: http://repo.or.cz/ppcg.git)
lcy-seso/reformer-pytorch
Reformer, the efficient Transformer, in Pytorch
lcy-seso/rmm
RAPIDS Memory Manager
lcy-seso/sofp
A free book: "The Science of Functional Programming"
lcy-seso/tensor_ops
lcy-seso/tensorflow
Computation using data flow graphs for scalable machine learning
lcy-seso/tiramisu
A polyhedral compiler for expressing fast and portable data parallel algorithms
lcy-seso/torchscript-to-tvm
lcy-seso/tvm-cuda-int8-benchmark
Benchmark of TVM quantized model on CUDA
lcy-seso/tvm_examples
lcy-seso/utvm_staticrt_codegen
This project contains a code generator that produces static C NN inference deployment code targeting tiny micro-controllers (TinyML) as replacement for other µTVM runtimes. This tools generates a runtime, which statically executes the compiled model. This reduces the overhead in terms of code size and execution time compared to having a dynamic on-device runtime.