/sequence-embedding

Representation learning of prokaryotic marker genes. Code for training byte-pair encoding tokenizer, DNA sequence language model, and 16S sequence vector embeddings.

Primary LanguagePython