eead-csic-compbio/metagenome_Pfam_score

Easy way to add genes to the nitrogen pathway?

Opened this issue · 7 comments

Hi,
I'm wondering if there is an easy way to modify the nitrogen pathway to replace or add the amoA gene from the uncultured archaeon by the same gene from the type strain nitrosopumilus maritimus that will much more represent the pathway by archaea
Thanks
Greg

Hi Greg,
Yes, thanks for asking that.
We are working on the version 1.2 which has the option -custom.
By using this option, MEBS is going to download the Pfam database so you can add all the pfams that you want in the mapping file

For example, If you want to analyze only AmoA from archaea I recommend you to modify the pfam2kegg.tab file in the custom directory as following

PFAM KO PATHWAY PATHWAY NAME
PF12942 1 Ammonia monoxygenase Archaea AmoABC
PF04744 1 Ammonia monoxygenase Archaea AmoABC
PF04896 1 Ammonia monoxygenase Archaea AmoABC

However, the nitrogen cycle already have the Archaea AmoABC as pathway 26. https://github.com/eead-csic-compbio/metagenome_Pfam_score/blob/master/cycles/nitrogen/pfam2kegg.tab

Be aware that using the custom option will be useful to compute the completeness of those pathways but not the score, that has to be done using the advanced mode.

As soon as the -custom option is implemented I will let you know. Meanwhile, you can try to focus only on N pathway 26 and see if that works for you.
Thanks
Val

Hi Val,
Thank you for the answer that will be indeed very useful.
My issue as of now is that the gene for amoA that you choose for archaea (I found only one) doesn't appear to be a blast match to one of the main taxonomic group possessing this gene in the archaeal domain, aka Nitrosopumilus.
Best
Greg

Hi Val,
So it's probably a confusion on my part the my_Pfam.nitrogen.hmm file contains the ones that I'm looking for. However in the nitrogen.fasta the amoA gene for archaea (tr|A0A023Q3R5|A0A023Q3R5_9ARCH Ammonia monooxygenase (Fragment) OS=uncultured archaeon GN=amoA PE=4 SV=1) doesn't blast to the main nitrosopumilus that I'm looking for, thus my confusion.
Would it be possible to clarify the role of the nitrogen.fasta in the analysis, if any, as it's not so clear for me.
Thanks for your help
Best
Greg

Hi Creg.,
Which protein family exactly are you looking for?. The fasta file of each cycle contains representative sequences, that at the end are used to obtain the protein families (Pfams), and then to compute the relative entropy and the score. If you are not interested in the score, use the custom option with the protein family that you want to analyze, it doesn't matter if is not in the fasta file because MEBS is going to look all the protein families in Pfam database and only display those in your mapping file.
Let me know if that was helpful.
P.D In the MEBS paper is described in Stage 1 the annotation of the sulfur genes, the paper for the rest of the cycles is not ready yet. :S https://academic.oup.com/gigascience/article/6/11/gix096/4561660
I can give you more information if need it.

Best
Val

Hi Val,

I was looking for the v1.2 version to install so that I could use the custom option but I've had no luck so far. Could you point me in the right direction?

Many thanks,

Vincent