FCLC
TLDR; ๐จ๐ฆ๐ง๐จ๐ผโ๐ป๐๏ธ๐งโโ๏ธ Lots of #FP16 & #BLAS these days Interests: #AVX512 #SYCL #F1 #HPC & making code FAST Born below 365 PPM
Pinned Repositories
A-Phenominal-benchmark
Small set of low level benchmarks for testing hardware speed against a Phenom II 810 Quad Core
AdvancedCiderXtensions
Measure accelerate BLAS performance
avx512_fp16_examples
hosting simple examples of fp16 code
clusterplex
ClusterPlex is basically an extended version of Plex, which supports distributed Workers across a cluster to handle transcoding requests.
FluidX3D
The fastest and most memory efficient lattice Boltzmann CFD software, running on any GPU via OpenCL.
Multi-Plexer
Goal: Low power cluster capable of serving 24+ streams of 4KHDR60 source transcodes while consuming no more than 100W at peak and idling at less than 10W
Talks
Holds slides and recordings of any talks I've done since 2023
FCLC's Repositories
FCLC/Multi-Plexer
Goal: Low power cluster capable of serving 24+ streams of 4KHDR60 source transcodes while consuming no more than 100W at peak and idling at less than 10W
FCLC/AdvancedCiderXtensions
Measure accelerate BLAS performance
FCLC/A-Phenominal-benchmark
Small set of low level benchmarks for testing hardware speed against a Phenom II 810 Quad Core
FCLC/avx512_fp16_examples
hosting simple examples of fp16 code
FCLC/clusterplex
ClusterPlex is basically an extended version of Plex, which supports distributed Workers across a cluster to handle transcoding requests.
FCLC/FluidX3D
The fastest and most memory efficient lattice Boltzmann CFD software, running on any GPU via OpenCL.
FCLC/Talks
Holds slides and recordings of any talks I've done since 2023
FCLC/adaptive-mesh-refinement
OpenFoamยฎ motorBike case with adaptive volume & surface mesh refinement based on curl(U) or grad(p)
FCLC/amx
Apple AMX Instruction Set
FCLC/anbox
Anbox is a container-based approach to boot a full Android system on a regular GNU/Linux system
FCLC/Choosing-a-compiler-performance-testing-GCC_ICC_ICPX_NVCC_CLANG_HIP
Used to host various test files and relevant launch scripts. Use's base parrelelism example of adding elements of two arrays
FCLC/Dav1d_benchmarking
FCLC/blis_apple
BLIS fork with kernels for Apple M1. (Perhaps) The first open-source BLAS with Apple Matrix Coprocessor support.
FCLC/FFmpeg
Mirror of https://git.ffmpeg.org/ffmpeg.git
FCLC/FP8
FCLC/Linux-hybrid-ISA-scheduler
Used as a staging ground for a hybrid scheduler capable of dealing with multiple different ISA's on the same host processor.
FCLC/media-driver
FCLC/Microbenchmarks
Trying to figure various CPU things out
FCLC/ml-stable-diffusion
Stable Diffusion with Core ML on Apple Silicon
FCLC/NVDLA
RTL, Cmodel, and testbench for NVDLA
FCLC/OpenBLAS
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
FCLC/PDPU
PDPU: An Open-Source Posit Dot-Product Unit for Deep Learning Applications
FCLC/PhysX-3.4
NVIDIA PhysX SDK 3.4
FCLC/relax-intel-rmrr
FCLC/rocm-build
build scripts for ROCm
FCLC/sandsifter
The x86 processor fuzzer
FCLC/standup5x5
Solutions to Stand Up Maths 5x5 Unique 25 letter problem
FCLC/tt03-submission-template
Submission template for Tiny Tapeout 03
FCLC/UnicornTranscoder
Remote transcoder for Plex
FCLC/zero
If Google Drive says that 1 is under copyright, 0 must be under copyleft