Generate datasets in fasta format

Question

Generate datasets in fasta format

RagnarGrootKoerkamp opened this issue 3 years ago · 3 comments

RagnarGrootKoerkamp commented 3 years ago

While the current output format of generate_dataset is convenient, it's non-standard.
Optionally writing to a fasta file would be more convenient for re-using this, since then I could use a library function instead of writing a custom parser.

Answer 1 · 2022-03-15T15:34:57.000Z

Well, it was meant to be a simple tool to generate simple datasets.
I know you know you can always awk the output and convert it to whatever you want.

cat test.seq | paste - - | awk '{s1=substr($1,2,length($1)-1); s2=substr($2,2,length($2)-1); printf(">Seq\n%s\n>Seq\n%s\n",s1,s2)}'

Nevertheless, if you still think that could be useful, I can implement it on the new version.
Let me know. Cheers,

Answer 2 · 2022-03-22T09:25:06.000Z

Sorry for being slow. I made this issue in my first days of using this new format and found it annoying. Now that I'm more used to it, I actually think it does its job well, and should indeed be different from normal fasta files.
Fasta files don't typically have pairs of sequences in them and aren't as easy to parse as your format.

Answer 3 · 2022-03-22T09:37:44.000Z

Good, thanks for the feedback.