/grnn

genone sequencing using LSTM

Primary LanguageJupyter Notebook

grnn

comparing genome sequences using LSTM recurrent neural network

encoding format

  • format 1: xxxxxx6yyyyyy1
  • format 2: xyxyxyxyxyxy1
  • format 3: xxxxxxyyyyyy1

Observations

  • The randomness of the data make significant differences. For fair comparison, the data set must be fixed. ( See format1_leng20_err0_500_100/*.png )

  • Comparing the DNA sequence encoding format, format 2 > format 1 > format 3 ( See formatMOD_leng20_err0_500_100/err*png )

  • LSTM can classify the sequences, even if the match is not 100% ( See formatMOD_leng20_err0_500_100/err1*png )

  • 2-neuron LSTM can deliver quite good performance on format 2 data. However, the performance of 8-nerual LSTM is bad on format 1 data and format 3 data. ( See format1_leng20_err1_500_100__different_RNN/*.png )

  • Using the full length data string in format 2, the 16-neural LSTM and 8-neuron LSTM gives good result. ( See formatMOD_leng65_err1_500_100/*.png )