smarco/WFA-paper

Generate datasets in fasta format

RagnarGrootKoerkamp opened this issue · 3 comments

While the current output format of generate_dataset is convenient, it's non-standard.
Optionally writing to a fasta file would be more convenient for re-using this, since then I could use a library function instead of writing a custom parser.

Well, it was meant to be a simple tool to generate simple datasets.
I know you know you can always awk the output and convert it to whatever you want.

cat test.seq | paste - - | awk '{s1=substr($1,2,length($1)-1); s2=substr($2,2,length($2)-1); printf(">Seq\n%s\n>Seq\n%s\n",s1,s2)}'

Nevertheless, if you still think that could be useful, I can implement it on the new version.
Let me know. Cheers,

Sorry for being slow. I made this issue in my first days of using this new format and found it annoying. Now that I'm more used to it, I actually think it does its job well, and should indeed be different from normal fasta files.
Fasta files don't typically have pairs of sequences in them and aren't as easy to parse as your format.

Good, thanks for the feedback.