dnbaker
I write fast, memory-efficient software for scientific applications.
@langmead-lab Baltimore, MD
Pinned Repositories
BMFtools
Barcoded Molecular Families
aesctr
C++ implementation of AES-CTR PRNG using SIMD, based on Samuel Neves' Implementation
bioseq
Tokenizers and Machine Learning Models for biological sequence data
bonsai
Bonsai: Fast, flexible taxonomic analysis and classification
dashing
Fast and accurate genomic distances using HyperLogLog
dashing2
Dashing 2 is a fast toolkit for k-mer and minimizer encoding, sketching, comparison, and indexing.
frp
FRP: Fast Random Projections
minicore
Fast and memory-efficient clustering + coreset construction, including fast distance kernels for Bregman and f-divergences.
sketch
C++ Implementations of sketch data structures with SIMD Parallelism, including Python bindings
vec
Type-generic SIMD library for optimized generic code generation
dnbaker's Repositories
dnbaker/dashing
Fast and accurate genomic distances using HyperLogLog
dnbaker/sketch
C++ Implementations of sketch data structures with SIMD Parallelism, including Python bindings
dnbaker/bonsai
Bonsai: Fast, flexible taxonomic analysis and classification
dnbaker/dashing2
Dashing 2 is a fast toolkit for k-mer and minimizer encoding, sketching, comparison, and indexing.
dnbaker/minicore
Fast and memory-efficient clustering + coreset construction, including fast distance kernels for Bregman and f-divergences.
dnbaker/bioseq
Tokenizers and Machine Learning Models for biological sequence data
dnbaker/vec
Type-generic SIMD library for optimized generic code generation
dnbaker/aesctr
C++ implementation of AES-CTR PRNG using SIMD, based on Samuel Neves' Implementation
dnbaker/wmh
Weighted Minhash Code
dnbaker/fastiota
Fast std::iota for contiguous memory using SIMD operations
dnbaker/libsimdsampling
Data- and processor- parallelism for fast weighted sampling
dnbaker/fgc
dnbaker/libkl
Kernels for fast vectorized KL divergence + related
dnbaker/dashing2-experiments
dnbaker/libtorch-kseq-demo
Demo using libtorch and one-hot encoding for fastx files
dnbaker/dashing2-binaries
Binaries for releases for Dashing2
dnbaker/minicore-experiments
Experiments for minicore: fast scRNA-seq clustering with various distances
dnbaker/scavenger
Rust spatial/single-cell genomics
dnbaker/tilt
Biased dataloaders for PyTorch and related utilities
dnbaker/bioconda-recipes
Conda recipes for the bioconda channel.
dnbaker/dashing-binaries
dnbaker/distmat
2-dimensional distance matrix for holding distances of arbitrary types.
dnbaker/dnbaker
dnbaker/einops
Simplistic API for deep learning tensor operations
dnbaker/FFHT
Fast Fast Hadamard Transform
dnbaker/megadepth
BigWig and BAM utilities
dnbaker/minilsh
Python bindings for Locality-Sensitive Hashers, built on the minicore C++ library.
dnbaker/pathml
Tools for computational pathology
dnbaker/ProtTrans
ProtTrans is providing state of the art pretrained language models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using Transformers Models.
dnbaker/rust-tokenizers
Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigram (SentencePiece) models