refresh-bio/agc

genome name in fasta idline

lynnjo opened this issue · 2 comments

lynnjo commented

hi all. Is there a way to have the genome name included in the id line when AGC outputs a fasta file?

For example: I make a query to get chr1 from different genomes. This query might look like:

agc getctg assemblies.agc chr1@LineA chr1@LineB chr1@LineC > fasta.out

AGC's output shows id lines of ">chr1" for all 3 of these, which makes it difficult to distinguish which sequence belongs to which genome. We are hoping to use AGC for our research project, and this is a scenario that will frequently be encountered.

Ahy suggestions?

lh3 commented

Agc keeps FASTA comments. I would recommend to encode sample/species information there such that you can identify the source later.

lynnjo commented

Thank you - we'll try updating our fasta files and the code that parses AGC output.