MOTIF-DISCOVERY_GIBBS-SAMPLER-using-python

Gibbs sampling (also called alternating conditional sampling) is a Markov Chain Monte Carlo algorithm for high-dimensional data. GibbsSampler is a motif finding algorithm that finds one common motif and returns a list of bestMotifs containing the closest motif match from each string in dna.

Problem Statement

Given: Integers k, t, and N, followed by a collection of strings Dna.
Return: The strings BestMotifs resulting from running GibbsSampler(Dna, k, t, N) with 20 random starts.

We have to code Gibbs Sampler for the purpose of motif discovery. Here,
        Dna -- A collection of DNA strings that are of the same length.
       "t" -- Is an integer indicating how many times to read the genetic algorithm.
        "k" -- An integer indicating the motif length being searched for.
        "N" -- The number of iterations before returning the best motif.

Algorithm

GibbsSampler (Dna, k , t , N)
       randomly select k−Mers Motifs = ( Motif 1 , . . . , Motift ) in each string from Dna
       BestMotifs ←Motifs
       for j ←1 to N
              i ← Random ( t )
              Profile ←profile matrix constructed from all strings in Motifs except for Motifi
              Motifi ←Profile − randomly generated k-Mer in the i-th sequence
              if Score ( Motifs ) < Score ( BestMotifs )
                     BestMotifs ←Motifs
       return BestMotifs