YuLab-SMU/MicrobiotaProcess

Summarizing data from mpse object without changing class

etorres475 opened this issue · 9 comments

Hi Shuangbin,

In my experimental design, I sequenced some samples and sub-samples. I would like to summarize and calculate the mean abundances for those samples and sub-samples so they represent one single sample in the posterior analysis with MicrobiotaProcess. When I try to do it using dpyr on the mpse object, it is converted into a tibble data (tbl_mpse object) and then into a tbl_df. I tried using as.mpse on the resulting file but it ended in NULL (please see the screenshots attached). Is it possible to do this in some other way?

Cheers!

Screenshot 2023-04-22 at 5 16 11 PM

Screenshot 2023-04-22 at 5 22 49 PM

Screenshot 2023-04-22 at 5 23 31 PM

I am getting another error when using mp_cal_rarecurve in:

mpse12.a <- mpse12 %>% mp_rrarefy() %>% mp_cal_alpha(.abundance = RareAbundance) %>% mp_cal_rarecurve(.abundance = RareAbundance)

Screenshot 2023-04-22 at 7 19 11 PM

Data can be found at: https://www.dropbox.com/s/eeblzk2uqggyv6o/mpse12.obj?dl=0

Sorry! Another question, I am generating very nice plots of taxa abundance per group, but I would like to know if the differences I see are statistically significant and extract the data. How can I do it?

Screenshot 2023-04-22 at 8 08 29 PM

you can refer to the related issue and answer

In my experimental design, I sequenced some samples and sub-samples. I would like to summarize and calculate the mean abundances for those samples and sub-samples so they represent one single sample in the posterior analysis with MicrobiotaProcess. When I try to do it using dpyr on the mpse object, it is converted into a tibble data (tbl_mpse object) and then into a tbl_df. I tried using as.mpse on the resulting file but it ended in NULL (please see the screenshots attached). Is it possible to do this in some other way?

I think you can use mp_aggregate to do it.

library(MicrobiotaProcess)
data(mouse.time.mpse)
mouse.time.mpse %>% mp_aggregate(.group=time)

you can refer to the related issue and answer

Thanks a lot for your help. I want to know which Phyla are differentially expressed, not the phyla to which the biomarkers belong, so I need to specify that I want to calculate the mp_diff_analysis in the Phyla.

I am trying the following:

b.ptt.16 <- b.mpse16 %>% mp_rrarefy() %>% mp_diff_analysis(.abundance=RareAbundance, .group=Plants_grown_in_soil, p.adjust = "fdr",first.test.alpha = 0.05,filter.p = "pvalue", taxa.class = "Phylum") %>% mp_extract_taxatree()

But I get the following warning:

Screenshot 2023-04-23 at 1 51 00 PM

What am I doing wrong?

mouse.time.mpse %>% mp_aggregate(.group=time)

I got this:

Screenshot 2023-04-23 at 2 03 19 PM

  • I want to know which Phyla are differentially expressed, not the phyla to which the biomarkers belong, so I need to specify that I want to calculate the mp_diff_analysis in the Phyla.

First, you can extract the differential result (taxatree slot, a treedata object ).

b.mpse16 %>% mp_rrarefy() %>% mp_diff_analysis(.abundance=RareAbundance, .group=Plants_grown_in_soil, p.adjust = "fdr",first.test.alpha = 0.05,filter.p = "pvalue")  %>% mp_extract_taxatree() -> daa.td 

The daa.td is a treedata object which can be used ggtree, treeio, tidytree and ggtreeExtra directly. You can find nodeClass contains different taxonomy.

> daa.td %>% dplyr::pull(nodeClass) %>% unique
[1] "OTU"     "Root"    "Kingdom" "Phylum"  "Class"   "Order"   "Family"
[8] "Genus"   "Speies"

Then, you can use filter of dplyr to choose the specified taxa.

> daa.td %>% dplyr::filter(nodeClass == 'Phylum' & !is.na(Sign_Plants_grown_in_soil), keep.td=F)
# A tibble: 1 × 12
   node label   isTip nodeClass nodeDepth RareAbundanceBySample LDAupper LDAmean
  <int> <chr>   <lgl> <chr>         <dbl> <list>                   <dbl>   <dbl>
1  2584 p__Aci… FALSE Phylum            2 <tibble [45 × 5]>         4.19    4.15
# ℹ 4 more variables: LDAlower <dbl>, Sign_Plants_grown_in_soil <chr>,
#   pvalue <dbl>, fdr <dbl>                                                     
  • mouse.time.mpse %>% mp_aggregate(.group=time)
    I got this:

Please re-install MicrobiotaProcess by remotes::install_github(YuLab-SMU/MicrobiotaProcess).

It worked perfectly! Thanks a lot! :)

The only remaining error I keep getting is this:

I am getting another error when using mp_cal_rarecurve in:

mpse12.a <- mpse12 %>% mp_rrarefy() %>% mp_cal_alpha(.abundance = RareAbundance) %>% mp_cal_rarecurve(.abundance = RareAbundance)

Screenshot 2023-04-22 at 7 19 11 PM

Data can be found at: https://www.dropbox.com/s/eeblzk2uqggyv6o/mpse12.obj?dl=0

I checked and apparently, something is going on after filtering the features with low abundance (last step in the following command):

otuqzafile_12 <- "./its_12/table_its_12_filtered.qza"
taxaqzafile_12 <- "./its_12/taxonomy-its-12.qza"
mapfile_12 <- "./its_12/sample-metadata_its_12.txt"
mpse12 <- mp_import_qiime2(otuqza=otuqzafile_12, taxaqza=taxaqzafile_12, mapfilename=mapfile_12)

taxa_12 <- taxonomy(mpse12)
taxa_12[apply(taxa_12, 2, function(i)grepl("un", i))] <- NA
taxa_12 <- apply(taxa_12, 2, function(i)gsub("*.
_", "", i))
taxonomy(mpse12) <- taxa_12
rownames(mpse12) <- paste0('ASV', seq_len(nrow(mpse12)))

mpse12 %>% mp_extract_feature(addtaxa=T) %>% dplyr::pull(Phylum) %>% unique()
mpse12 <- mpse12 %>% dplyr::filter(!Phylum %in% c("p__Fungi_phy_Incertae_sedis", "p__un_k__Fungi"))

mpse12 <- mpse12 %>% mp_filter_taxa(.abundance=Abundance, min.abun=0, min.prop=.1)

If I skip this last step I can calculate my rarefaction curve.
Screenshot 2023-04-24 at 11 03 44 AM

Thanks a lot!