jbisanz/qiime2R

Relative abundance (%) of common OTUs

Closed this issue · 3 comments

Hi @jbisanz,
I am working on a 16s rRNA project and have analyzed my data using qiime2 and visualized most of my figures in R.
Thanks to you and other experts from this field.
Now, I am trying to make a "Dot plot showing the relative abundance (in %) of common bacterial OTUs" using R, as shown in the figure below. On the X-axis I want to arrange sampling-day in ascending order and on the Y-axis I want to have top 20 most abundant taxa.
image

I have an OTU table with OTU ID as rows and sample-id as columns.
table.txt

My metadata file is
metadata_bact_R.txt

The problem with many solutions online is that they do not show the actual format of their input file used for analysis in R.
Therefore, it becomes difficult to understand what are the contents within that input file.

It would be so kind of you if you could write an R script or at least suggest a solution for this issue.
Thank you,
Govind

Hi Govind,

It should be relatively straight forward. Assuming you would want to plot the median abundance across your samples within any other grouping variable like time you could do something like below (and note I did not actually try this code).
This would grab the top 20 across all samples.

top20<-table %>%
make_percent() %>%
as.data.frame() %>%
rownames_to_column("Taxon") %>%
pivot_longer(!Taxon, names_to="SampleID", values_to="Abundance")  %>%
summarize(median=median(Abundance)) %>%
top_n(20, median)

table %>%
make_percent() %>%
as.data.frame() %>%
rownames_to_column("Taxon") %>%
pivot_longer(!Taxon, names_to="SampleID", values_to="Abundance") %>%
left_join(metadata) %>%
group_by(Taxon, Time) %>%
summarize(median=median(Abundance)) %>%
filter(Taxon %in% top20$Taxon) %>%
ggplot(aes(x=Time, y=Taxon, size=median)) +
geom_point()

Hi @BisanzLab,
I tried the codes that you suggested but seems like I am missing some R packages to run them.

First I imported the table.txt file into R using the following script.
OTU <- read.table("table.txt", header= TRUE, sep = "\t")
And, when I am piping it to make_percent(), which I suppose converts the values to their relative abundance (%),
gives me the following error message: "Error in FUN(left, right) : non-numeric argument to binary operator".
I believe this is happening because, the first column (#OTU ID) which has taxa names is interfering since it has characters but not numeric values. Is there a solution to make this script work?
Can you please check if the script works on the file (table.txt) attached above?
Thank you,
Govind

Thank you.
It worked.