CEGRcode/scriptmanager

Add FASTA Chr Name Converter

Opened this issue · 1 comments

owlang commented

The sacCer3 reference genome chromosome naming ids use roman numerals in the official copy and arabic numerals in other resources (to match system used by other model organisms). We want to support converting FASTA-formatted files between both systems.

Options:

  • input files
  • checkbox to toggle "chrmt" vs "chrM"
  • checkbox to gzip output
  • toggle "arabic to roman" vs "roman to arabic"
  • output directory

Psuedocode:

  • use hashmap created for BED/GFF Chr Name Converter
  • Regex out the first ">[A-Za-z0-9]" and then use that as the key for the hash--this allows us to reformat chr name in any kind of sequence ids using any non-alphanumeric delimiter

Model code after File_Utilities --> Chr Name Converter

Related issue for reference: #49

owlang commented

Move to yeast only tab with Coordinate chr numeral converters under File Utilities