
Bachelor thesis on improving IML methods for genomic data

Primary LanguageR

Update of Progress:

as of 12.08.

  1. Data Synthesis Algorithm successfully implemented

  2. rsd Data generated and model trained with acc ca. 90%

  3. IG implemented on synthetic data, flaws detected

  4. Improved graphical display (information compression) proposed

  5. Graphics from Durrant & Bhatt paper reproduced on synthetic data

  6. Positional Codon Synonym Similarity score defined and implemented

  7. IG function from DeepG modified to accept ANY one-hot coded baseline as argument

  8. Modified IG tested on synthetic data and analyzed graphically

  9. Real 16S rRNA data applied, tutorial model flaws detected

  10. 16S model retrained and tested with modified IG, graphical analysis

  11. Sampling-based important feature selection implemented on locus level

  12. Attempted input reconstruction based on iterative IG, not very helpful

For details, read reports/ folder. Order: RSDexport —> IGmodification —> realdata16s