chris-ha458
Korean Name : Seungsoo Ha Anesthesiologist working on foundational multilingual generative language models.
Independent research with EleutherAI, DuckAISeoul
Pinned Repositories
dolma
Data and tools for generating and inspecting OLMo pre-training data.
.github
aHash
aHash is a non-cryptographic hashing algorithm that uses the AES hardware instruction
antialiased-cnns
Antialiasing cnns to improve stability and accuracy. In ICML 2019.
assembled-cnn
Official implementation of "Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network"
pytorch-image-models
PyTorch image models, scripts, pretrained weights -- (SE)ResNet/ResNeXT, DPN, EfficientNet, MixNet, MobileNet-V3/V2/V1, MNASNet, Single-Path NAS, FBNet, and more
tokenizer_manipulations
txlsh
tlsh with pyo3 and soon xxhash
wyhash-rs
wyhash fast portable non-cryptographic hashing algorithm and random number generator in Rust
tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
chris-ha458's Repositories
chris-ha458/aHash
aHash is a non-cryptographic hashing algorithm that uses the AES hardware instruction
chris-ha458/awesome-data-deduplication
An awesome list of data deduplication use cases, papers, tools, and methods.
chris-ha458/bff
chris-ha458/candle
Minimalist ML framework for Rust
chris-ha458/charset-normalizer-rs_original
Truly universal encoding detector in pure Rust - port of Python version
chris-ha458/chash
Consistent HashRing
chris-ha458/CLRS-rs
CLRS pseudocode in rust
chris-ha458/counter-rs
Simple object to count Rust iterables
chris-ha458/CU_MSCS_Projects
Project portfolios as part of final and assignment projects from Colorado University MSCS classes
chris-ha458/dfdx_cifar
chris-ha458/dolma
Data and tools for generating and inspecting OLMo pre-training data.
chris-ha458/elara
Work-in-progress educational programming game
chris-ha458/esaxx-rs
Bindings to copy of SentencePiece esaxx library (fast suffix array and frequent substrings).
chris-ha458/highway-rs
Native Rust port of Google's HighwayHash, which makes use of SIMD instructions for a fast and strong hash function
chris-ha458/itertools
Extra iterator adaptors, iterator methods, free functions, and macros.
chris-ha458/llm
An ecosystem of Rust libraries for working with large language models
chris-ha458/npp-linux-01-intro
chris-ha458/ocrs
A modern OCR engine, written in Rust
chris-ha458/ogg
Ogg container decoder and encoder written in pure Rust
chris-ha458/regex
An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.
chris-ha458/rfcs
RFCs for changes to Rust
chris-ha458/rspdf
PDF library in Rust
chris-ha458/sonic-rs
A fast Rust JSON library based on SIMD.
chris-ha458/subset_sum
Solves subset sum problem and returns a set of decomposed integers.
chris-ha458/suffix
Fast suffix arrays for Rust (with Unicode support).
chris-ha458/Symphonia
Pure Rust multimedia format demuxing, tag reading, and audio decoding library
chris-ha458/text-dedup
All-in-one text de-duplication
chris-ha458/tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
chris-ha458/unicode_names2
char <-> Unicode character name (maintained fork of huonw/unicode_names)
chris-ha458/vers
very efficient rank and select