gavinmdouglas/picrust2_manuscript

Further questions regarding the categorize_by_function.py for PICRUSt2

Closed this issue · 29 comments

Good afternoon,
I am writing in relation to some further issues with the process described in the issue from zina-R (PiCrust2 categorize_by_function.py #1).
I've followed the steps involving the R function as indicated and I was successfully able to reproduce them. As a result, I got three separate datasets, each one belonging to a different KO level. However, I intend to use my dataset for further analysis via LEfSe for functional biomarker analysis and for this I would like to have the information of the 3 levels all in the same data table to later create a cladogram (with taxonomic data you can include different hierarchical taxonomical levels separated by | and this is what I am trying to do with the functional data).
Could anyone help me on solving this? Does anyone know if there is a way to get this final output?
Thanks in advance, Mikel.

Hi @mikelgutmut,

Sorry, I don't have code to do that specifically, but all of the levels could be parsed in a similar way as in the R code referred to in that issue you linked.

Hi @mikelgutmut

Have you managed to do it at different levels?
Cheers

Hi @gavinmdouglas
I am trying to do categorize_by_function for picrust2 output in R. Is there any other dependencies which need to be installed in R? I am getting the following error:
test_ko_L3 <- categorize_by_function_l3(test_ko, kegg_brite_map)
Error in categorize_by_function_l3(test_ko, kegg_brite_map) :
could not find function "categorize_by_function_l3"

The script I provided is all that is needed - but you need to define the categorize_by_function_l3 function first (i.e., run that code first). Is it present in the file you're working with?

The script I provided is all that is needed - but you need to define the categorize_by_function_l3 function first (i.e., run that code first). Is it present in the file you're working with?

Hi @gavinmdouglas
There was issue with the "categorize_by_function_l3". Now I have successfully run the codes and created the table. Is there a way we can create similar table at level 1 and level 2.

It's been a while since I've looked at that script, but yes I believe you would just need to change the function to regroup to a different column (the 1st or 2nd) rather than the 3rd.

Cheers,

Gavin

It's been a while since I've looked at that script, but yes I believe you would just need to change the function to regroup to a different column (the 1st or 2nd) rather than the 3rd.
Hi @gavinmdouglas
It would be great if you can highlight the positions in the script which needs to be changed for regrouping.

Hi @mikelgutmut
I have successfully created table at level 3 by running "categorize_by_function_l3". Can you please share the code for creating table at the other two levels. And how you merged the information of the three levels in a single dataset?

That's not a script I maintain as part of PICRUSt2, but I think you would need to change pathway <- strsplit(pathway, ";")[[1]][3] to be pathway <- strsplit(pathway, ";")[[1]][1] for instance to get the first level of KEGG BRITE. You could try that and see if the output made sense.

Cheers,

Gavin

Thank you @gavinmdouglas
Changing pathway <- strsplit(pathway, ";")[[1]][3] to be pathway <- strsplit(pathway, ";")[[1]][1] worked. I successfully created table for level 1 and level 2. Is there a way the information of the three different levels can be put in one table?

Great! I don't have a script handy to do that, so you would need to use custom code to do that.

Cheers,

Gavin

Thanks @gavinmdouglas. It would be great if you can help in doing that, as I am not much into bioinformatics.

For sure, if you have specific R code you need help with I would be happy to give feedback.

Cheers,

Gavin

For sure, if you have specific R code you need help with I would be happy to give feedback.

Cheers,

Gavin

Thanks Gavin.

Hi Gavin,
I am trying to run the R codes for categorize_by_function. Below are the details and errors I am getting.
kegg_brite_map <- read.table("picrust1_KO_BRITE_map.tsv", header=TRUE, sep="\t", quote = "", stringsAsFactors = FALSE, comment.char="", row.names=1)

test_ko <- read.table("KO_out.tsv", header=TRUE, sep="\t", row.names=1)

categorize_by_function_l3 <- function(test_ko, kegg_brite_mapping)

test_ko_L3 <- categorize_by_function_l3(test_ko, kegg_brite_map)

if(length(which(colnames(test_ko) == "KEGG_Pathways") > 0)) { test_ko <- test_ko[, -which(colnames(test_ko) == "KEGG_Pathways")]
}
test_ko_L3_sorted <- test_ko_L3[rownames(orig_ko_L3), ]

Error: object 'test_ko_L3' not found

Thank you in advance

Hi @Nisa435,

I should preface that this is just example R code and isn't something I officially maintain.

However, test_ko_L3 is being assigned at this step: test_ko_L3 <- categorize_by_function_l3(test_ko, kegg_brite_map), so what error is given when you run that code?

Thanks,

Gavin

Hi Gavin,
Yes, it is assigned. When I run test_ko_L3_sorted <- test_ko_L3[rownames(orig_ko_L3), ], this is the error I am getting.

Error: object 'test_ko_L3' not found

Hello,
There is no error apart from that. Or I might be missing something.

kegg_brite_map <- read.table("picrust1_KO_BRITE_map.tsv", header=TRUE, sep="\t", quote = "", stringsAsFactors = FALSE, comment.char="", row.names=1)
test_ko <- read.table("KO_out.tsv", header=TRUE, sep="\t", row.names=1)
categorize_by_function_l3 <- function(test_ko, kegg_brite_mapping)

  • test_ko_L3 <- categorize_by_function_l3(test_ko, kegg_brite_map)

if(length(which(colnames(test_ko) == "KEGG_Pathways") > 0)) { test_ko <- test_ko[, -which(colnames(test_ko) == "KEGG_Pathways")]

  • }

test_ko_L3_sorted <- test_ko_L3[rownames(orig_ko_L3), ]
Error: object 'test_ko_L3' not found
orig_ko_L3 <- read.table("test_ko_L3.tsv", header=TRUE, sep="\t", row.names=1, skip=1, comment.char="", quote="")
orig_ko_L3 <- orig_ko_L3[, -which(colnames(orig_ko_L3) == "KEGG_Pathways")]
orig_ko_L3 <- orig_ko_L3[-which(rowSums(orig_ko_L3) == 0),]
identical(test_ko_L3_sorted, orig_ko_L3)
Error in identical(test_ko_L3_sorted, orig_ko_L3) :
object 'test_ko_L3_sorted' not found

Kegg_brite map and test_ko read table are good. However, after running the test_ko_L3 got the below details.

test_ko_L3 <- categorize_by_function_l3(test_ko, kegg_brite_map)

head(test_ko_L3,n=10)

1 function (test_ko, kegg_brite_mapping)
2 head(categorize_by_function_l3)

Hey again,

This is a very odd error - it seems like test_ko_L3 is being assigned the value of the function categorize_by_function_l3 itself rather than as the output of that tool. Can you confirm that the function categorize_by_function_l3 was already defined prior to running these commands?

Gavin

Hi Gavin,
Sure.

kegg_brite_map <- read.table("picrust1_KO_BRITE_map.tsv", header=TRUE, sep="\t", quote = "", stringsAsFactors = FALSE, comment.char="", row.names=1)
test_ko <- read.table("KO_out.tsv", header=TRUE, sep="\t", row.names=1)
categorize_by_function_l3 <- function(test_ko, kegg_brite_mapping)

  • test_ko_L3 <- categorize_by_function_l3(test_ko, kegg_brite_map)

if(length(which(colnames(test_ko) == "KEGG_Pathways") > 0)) { test_ko <- test_ko[, -which(colnames(test_ko) == "KEGG_Pathways")]

  • }

test_ko_L3_sorted <- test_ko_L3[rownames(orig_ko_L3), ]
Error: object 'test_ko_L3' not found

Hi! I am now done with this command but am unable to continue forward:
pathway_pipeline.py -i KO_metagenome_out/pred_metagenome_contrib.tsv.gz -o KEGG_pathways_out --no_regroup --map picrust2/picrust2/default_files/pathway_mapfiles/KEGG_pathways_to_KO.tsv

running test_ko_L3 results always into this error:
Error in aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...) :
no rows to aggregate

Hoping to hear from you!

Hi @EJS01,

That error makes it sound like the input file is empty (or at least none of the KEGG ortholog IDs intersect between the table and the mapfile). I'm not sure what the issue is, but just so you know this is the GitHub repo for the manuscript code (i.e., code used for running statistical analyses in the paper, and not the codebase itself). You can find the actual codebase here: https://github.com/picrust/picrust2.

Cheers,

Gavin

Hi Gavin,

The other commands are now working, and I am able to see the level 3. However, typing this command again points out another error:

test_ko_L3_sorted <- test_ko_L3[rownames(orig_ko_L3), ]
Error: object 'orig_ko_L3' not found

Likewise, is there a code for us to determine the other levels (e.g., levels 1,2) as well?

Hi @EJS01,

Those are just rough example R commands, rather than actual commands that should be run. If you read in the orig_ko_L3 table first you wouldn't get that error. That R code is just an example of how to regroup a table in R, rather than an official maintained function. However, you could test changing the "pathway <- strsplit(pathway, ";")[[1]][3]" line to regroup to different levels. It's been a long time since I've looked at those tables, but I believe taking the first or second element rather than the third would let you regroup to different KEGG levels.

Cheers,

Gavin