/PAD2020_GenomicSequence_to_Species_Cluster

Student project. Parse a "sequncefile" and generate an binary tree in form of a parentisis culstered string in the end. DD 18.12.2020

Primary LanguagePython

PAD2020_GenomicSequence_to_Species_Cluster

This project is my very first GitHub upload :-D the project useses a dynamic programming approach to get the job done. will ultimatly have 4 functions according to the following discription:

P1.py reads a file which and gets labels and DNA sequences out of this file in form of a list of tuples, where the first element of the tuple is a label and the second is a DNA sequence.

P2.py alignes the different sequences in pairs, where gaps might be introduced based on a calculation method. Further, the aligend sequences are saved in a dictonary with keys which are tuples with integers representing the sequence pairs and values as tuples with strings containing the aligned DNA sequences.

P3.py calculates a distance matrix where each aligned DNA pair results in an entriy in a matrix (list of list of floats) where the resulting values are saved.

P4.py with the distance matrix and a list of labels a binary tree can be formed in a sting with parentesis.

This project was realy fun to work on and continued to give valueable insides to the world of programming.