updated mp_import_humann_regroup to keep the abundance of contributed taxa
xiangpin opened this issue · 0 comments
xiangpin commented
introduced keep.contribute.abundance
argument in mp_import_humann_regroup()
6ccd981.
The default
keep.contribute.abundance = FALSE
only the taxa information was kept.
> library(MicrobiotaProcess)
MicrobiotaProcess v1.13.2.992 For help:
https://github.com/YuLab-SMU/MicrobiotaProcess/issues
If you use MicrobiotaProcess in published research, please cite the
paper:
Shuangbin Xu, Li Zhan, Wenli Tang, Qianwen Wang, Zehan Dai, Lang Zhou,
Tingze Feng, Meijun Chen, Tianzhi Wu, Erqiang Hu, Guangchuang Yu.
MicrobiotaProcess: A comprehensive R package for deep mining
microbiome. The Innovation. 2023, 4(2):100388. doi:
10.1016/j.xinn.2023.100388
Export the citation to BibTex by citation('MicrobiotaProcess')
This message can be suppressed by:
suppressPackageStartupMessages(library(MicrobiotaProcess))
Attaching package: ‘MicrobiotaProcess’
The following object is masked from ‘package:stats’:
filter
> mpse.ko1 <- mp_import_humann_regroup('./QJ.humann3_ko.tsv', './SRP190865_meta.csv')
> mpse.ko1
# A MPSE-tibble (MPSE object) abstraction: 498,387 × 6
# OTU=5359 | Samples=93 | Assays=Abundance | Taxonomy=NULL
OTU Sample Abundance geo_loc_name_country Group contribute.taxa
<chr> <chr> <dbl> <chr> <chr> <list>
1 K00001 SRR8849198 0 China PCOS <tibble [8 × 1]>
2 K00002 SRR8849198 0 China PCOS <tibble [3 × 1]>
3 K00003 SRR8849198 55.1 China PCOS <tibble [29 × 1]>
4 K00004 SRR8849198 0 China PCOS <tibble [3 × 1]>
5 K00005 SRR8849198 83.0 China PCOS <tibble [24 × 1]>
6 K00007 SRR8849198 0 China PCOS <tibble [1 × 1]>
7 K00008 SRR8849198 0 China PCOS <tibble [6 × 1]>
8 K00009 SRR8849198 39.4 China PCOS <tibble [23 × 1]>
9 K00010 SRR8849198 0 China PCOS <tibble [16 × 1]>
10 K00012 SRR8849198 1878. China PCOS <tibble [27 × 1]>
# ℹ 498,377 more rows
# ℹ Use `print(n = ...)` to see more rows
> mpse.ko1 %>% mp_extract_feature() %>% tidyr::unnest(contribute.taxa)
# A tibble: 85,919 × 2
OTU contribute.taxa
<chr> <chr>
1 K00001 s__Bifidobacterium_bifidum
2 K00001 s__Bifidobacterium_longum
3 K00001 s__Eggerthella_lenta
4 K00001 s__Enterobacter_cloacae_complex
5 K00001 s__Klebsiella_pneumoniae
6 K00001 s__Lactobacillus_gasseri
7 K00001 s__Lactobacillus_paragasseri
8 K00001 s__Megasphaera_elsdenii
9 K00002 s__Blautia_obeum
10 K00002 s__Blautia_producta
# ℹ 85,909 more rows
# ℹ Use `print(n = ...)` to see more rows
keep.contribute.abundance=TRUE
the abundance of each contributed taxa in each sample will be kept, and they can be extract with mp_extract_feature
.
> mpse.ko2 <- mp_import_humann_regroup('./QJ.humann3_ko.tsv', './SRP190865_meta.csv', keep.contribute.abundance=T)
> mpse.ko2
# A MPSE-tibble (MPSE object) abstraction: 498,387 × 6
# OTU=5359 | Samples=93 | Assays=Abundance | Taxonomy=NULL
OTU Sample Abundance geo_loc_name_country Group contribute.taxa
<chr> <chr> <dbl> <chr> <chr> <list>
1 K00001 SRR8849198 0 China PCOS <tibble [8 × 94]>
2 K00002 SRR8849198 0 China PCOS <tibble [3 × 94]>
3 K00003 SRR8849198 55.1 China PCOS <tibble [29 × 94]>
4 K00004 SRR8849198 0 China PCOS <tibble [3 × 94]>
5 K00005 SRR8849198 83.0 China PCOS <tibble [24 × 94]>
6 K00007 SRR8849198 0 China PCOS <tibble [1 × 94]>
7 K00008 SRR8849198 0 China PCOS <tibble [6 × 94]>
8 K00009 SRR8849198 39.4 China PCOS <tibble [23 × 94]>
9 K00010 SRR8849198 0 China PCOS <tibble [16 × 94]>
10 K00012 SRR8849198 1878. China PCOS <tibble [27 × 94]>
# ℹ 498,377 more rows
# ℹ Use `print(n = ...)` to see more rows
> mpse.ko2 %>% mp_extract_feature()
# A tibble: 5,359 × 2
OTU contribute.taxa
<chr> <list>
1 K00001 <tibble [8 × 94]>
2 K00002 <tibble [3 × 94]>
3 K00003 <tibble [29 × 94]>
4 K00004 <tibble [3 × 94]>
5 K00005 <tibble [24 × 94]>
6 K00007 <tibble [1 × 94]>
7 K00008 <tibble [6 × 94]>
8 K00009 <tibble [23 × 94]>
9 K00010 <tibble [16 × 94]>
10 K00012 <tibble [27 × 94]>
# ℹ 5,349 more rows
# ℹ Use `print(n = ...)` to see more rows
> mpse.ko2 %>% mp_extract_feature() %>% tidyr::unnest(contribute.taxa)
# A tibble: 85,421 × 95
OTU contribute.taxa SRR8849198 SRR8849199 SRR8849200 SRR8849201 SRR8849202
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 K00001 s__Bifidobacte… 0 0 0 3.77 0
2 K00001 s__Bifidobacte… 0 0 0 0 0
3 K00001 s__Eggerthella… 0 0 0 0 0
4 K00001 s__Enterobacte… 0 0 0 0 0
5 K00001 s__Klebsiella_… 0 0 0 0 0
6 K00001 s__Lactobacill… 0 0 0 0 0
7 K00001 s__Lactobacill… 0 0 0 0 0
8 K00001 s__Megasphaera… 0 0 0 66.0 0
9 K00002 s__Blautia_obe… 0 0 0 0 0
10 K00002 s__Blautia_pro… 0 0 0 0 0
# ℹ 85,411 more rows
# ℹ 88 more variables: SRR8849203 <dbl>, SRR8849204 <dbl>, SRR8849205 <dbl>,
# SRR8849206 <dbl>, SRR8849207 <dbl>, SRR8849208 <dbl>, SRR8849209 <dbl>,
# SRR8849210 <dbl>, SRR8849211 <dbl>, SRR8849212 <dbl>, SRR8849213 <dbl>,
# SRR8849214 <dbl>, SRR8849215 <dbl>, SRR8849216 <dbl>, SRR8849217 <dbl>,
# SRR8849218 <dbl>, SRR8849219 <dbl>, SRR8849220 <dbl>, SRR8849221 <dbl>,
# SRR8849222 <dbl>, SRR8849223 <dbl>, SRR8849224 <dbl>, SRR8849225 <dbl>, …
# ℹ Use `print(n = ...)` to see more rows
the gene abundance of specified taxa can be extracted quickly and converted to MPSE
. For example, the following codes will extract the gene abundance of Bifidobacterium, then re-calculate the total specified gene abundance according to the abundance of each contributed taxa, and generated a new MPSE
object.
> mpse.ko2 %>% mp_extract_feature() %>% tidyr::unnest(contribute.taxa) %>% dplyr::filter(grepl('s__Bifidobact', contribute.taxa)) %>% dplyr::select(-contribute.taxa) %>% dplyr::group_by(OTU) %>% dplyr::summarize(dplyr::across(dplyr::everything(),sum)) %>% tibble::column_to_rownames(var='OTU') %>% MPSE() %>% dplyr::left_join(mpse.ko2 %>% mp_extract_sample())
# A MPSE-tibble (MPSE object) abstraction: 82,398 × 5
# OTU=886 | Samples=93 | Assays=Abundance | Taxonomy=NULL
OTU Sample Abundance geo_loc_name_country Group
<chr> <chr> <dbl> <chr> <chr>
1 K00001 SRR8849198 0 China PCOS
2 K00012 SRR8849198 8.03 China PCOS
3 K00013 SRR8849198 47.4 China PCOS
4 K00016 SRR8849198 51.8 China PCOS
5 K00031 SRR8849198 0 China PCOS
6 K00052 SRR8849198 40.5 China PCOS
7 K00053 SRR8849198 146. China PCOS
8 K00057 SRR8849198 0 China PCOS
9 K00058 SRR8849198 27.3 China PCOS
10 K00059 SRR8849198 5.59 China PCOS
# ℹ 82,388 more rows
# ℹ Use `print(n = ...)` to see more rows