/Genomic_MSA_Transformer

DNA sequences in a multiple sequence alignment transformer.

Primary LanguageJupyter Notebook

Genomic MSA Transformer

This repository contains the code for the Genomic MSA Transformer project. The project uses DNA sequences in a multiple sequence alignment transformer. These are trained in an unsupervised fashion. A classifier then uses the embeddings from the transformer to classify operons within genomes of various organisms.

Table of Contents

Paper

For more details on the project, please refer to our paper: Learning Genome Architecture Using MSA Transformers

alt text

Installation

To install the dependencies, run:

pip install -r requirements.txt

Usage

To train the model, run:

python train.py

To test the model, run:

python test.py

Results

The model achieved an accuracy of 90% on the test set.


I hope this helps! Let me know if you have any other questions.