This repository contains code for blog post here.
In this tutorial we have seen that how to use kmer features to compute similarity between two genome sequences. Here I will provide how to use the scripts to generate embedding that could be visualized using tensorboard. Beside the generation of embedding we can also use the script to find the most similar animal to the given animal
- tensorflow 2.x
- numpy
- torch
- tensorboard
Use the following command to download the sample dataset
curl -L -o data/sb008.fastz "https://drive.google.com/uc?export=download&id=1mJpltSs1negIBkzFSWYHcF8MyvnnAbrx"
The following bash script can be used to generate the embeddings for each animal genome in the given file.
python -m scripts.part1 -p data/sb008.fastz -t embd
After the above scipt run, use the following script to view the embeddings on tensorboad
tensorboard --logdir=runs
The same script can be used to view most similar animal for certain animal
python -m scripts.part1 -p data/sb008.fastz -t sim -a rat -k 5