Arkadiy-Garber/FeGenie

Normalized gene abundances in 'FeGenie-heatmap-data.csv'?

Closed this issue · 2 comments

Hi, I just tested out FeGenie and it seems to be a very convenient tool to hunt for iron-metabolism-genes; many thanks for bringing it out for us in the microbiology community! I was wondering if you could shed more light on what those numbers actually represent in the output file FeGenie-heatmap-data.csv. I'm assuming "normalized abundance of genes per functional category" in each of the genomes being tested?

Hi! Thanks for using FeGenie. Sorry for the delayed response.

With regard to the FeGenie-heatmap-data.csv output file, the values in that file indicate the number of genes (from each iron category) identified in each genome, normalized to the number of predicted ORFs in each genome. This number is then multiplied by the inflation factor, which is, by default, 1000. You can change this by setting (for example) -inflation 100, and this will essentially turn the values into percentages. Does that make sense?

We chose 1000 as the default inflation factor because we found that, especially in large metagenome assemblies, the number of genes for each iron gene category, divided by the total number of ORFs in each metagenome, results in very small numbers. So, multiplying by 1000 should make it easier to read.

I just added an additional option to FeGenie that will allow you to forgo normalization (-norm n), and create a FeGenie-heatmap-data.csv with the raw gene counts for each iron gene category.

Let me know if any of this doesn't make sense or if you have any other questions or issues!
Arkadiy

Thank you so much for taking your time out to give a detailed explanation. I'm very clear now!