"Please provide the reference for the variable" error when running Maaslin2
polinanvkv opened this issue · 1 comments
polinanvkv commented
Hello!
I am trying to run Maaslin2 with the code:
input_data = read.table(file = "4Masslin2_input.data_kos.taxonomy.archaea.mt.2group.tsv",
header = TRUE, sep = "\t")
rownames(input_data) <- input_data$Geneid_ord
input_data$Geneid_ord = NULL
metadata = read.table(file = "4Masslin2_metadata_kos.taxonomy.archaea.mt.2group.tsv",
header = TRUE, sep = "\t")
rownames(metadata) <- metadata$Geneid_ord
metadata$Geneid_ord = NULL
# Create the 'Ctrl' column
metadata$Ctrl <- ifelse(metadata$Diagnosis == "Ctrl", "Yes", "No")
# Create the 'PD' column
metadata$PD <- ifelse(metadata$Diagnosis == "PD", "Yes", "No")
# Create the 'iRBD' column
metadata$iRBD <- ifelse(metadata$Diagnosis == "iRBD", "Yes", "No")
reference <- unique(metadata$S)
reference <- c("Methanobrevibacter_A smithii","Methanobrevibacter_A smithii_A","Methanosphaera stadtmanae","Methanomethylophilus alvus","DTU008 sp001421185","Methanomassiliicoccus luminyensis","MX-02 sp006954405","Coprobacillus cateniformis","Methanobrevibacter_C arboriphilus_A","Methanosphaera cuniculi")
Maaslin2(input_data = input_data,
input_metadata = metadata,
fixed_effects = c("Ctrl", "PD", "iRBD", "S"),
reference = reference,
min_prevalence = 0,
output = "test",
transform = "LOG",
plot_heatmap = TRUE,
plot_scatter = TRUE,
heatmap_first_n = 50,
max_significance = 1)
Examples of my metadata and input data are below:
metadata
:
Diagnosis D P C O F G
K00053_1 Ctrl Archaea Methanobacteriota Methanobacteria Methanobacteriales Methanobacteriaceae Methanobrevibacter_A
K00053_2 Ctrl Archaea Methanobacteriota Methanobacteria Methanobacteriales Methanobacteriaceae Methanobrevibacter_A
K00053_3 Ctrl Archaea Methanobacteriota Methanobacteria Methanobacteriales Methanobacteriaceae Methanosphaera
K00053_4 Ctrl Archaea Thermoplasmatota Thermoplasmata Methanomassiliicoccales Methanomethylophilaceae Methanomethylophilus
K00053_5 PD Archaea Methanobacteriota Methanobacteria Methanobacteriales Methanobacteriaceae Methanobrevibacter_A
K00053_6 PD Archaea Methanobacteriota Methanobacteria Methanobacteriales Methanobacteriaceae Methanobrevibacter_A
S Ctrl PD iRBD
K00053_1 Methanobrevibacter_A smithii Yes No No
K00053_2 Methanobrevibacter_A smithii_A Yes No No
K00053_3 Methanosphaera stadtmanae Yes No No
K00053_4 Methanomethylophilus alvus Yes No No
K00053_5 Methanobrevibacter_A smithii No Yes No
K00053_6 Methanobrevibacter_A smithii_A No Yes No
input_data
:
tpm
K00053_1 166.502489
K00053_2 188.409788
K00053_3 69.970092
K00053_4 2.219452
K00053_5 642.522944
K00053_6 136.308126
As a result I receive an error:
2023-05-11 17:25:04 INFO::Writing function arguments to log file
2023-05-11 17:25:04 INFO::Verifying options selected are valid
2023-05-11 17:25:04 INFO::Determining format of input files
2023-05-11 17:25:04 INFO::Input format is data samples as rows and metadata samples as rows
2023-05-11 17:25:04 INFO::Formula for fixed effects: expr ~ Ctrl + PD + iRBD + S
Error in Maaslin2(input_data = input_data, input_metadata = metadata, :
Please provide the reference for the variable 'S' which includes more than 2 levels: Methanobrevibacter_A smithii, Methanobrevibacter_A smithii_A, Methanosphaera stadtmanae, Methanomethylophilus alvus, Methanomassiliicoccus_A intestinalis, UBA71 sp905187815, DTU008 sp001421185, Methanomassiliicoccus luminyensis, MX-02 sp006954405, Coprobacillus cateniformis, Methanobrevibacter_C arboriphilus_A, Methanosphaera cuniculi, Methanobrevibacter ruminantium_A.
Could you please suggest a solution to the error and probably the source of it?
github-actions commented
Thank you for creating this issue.
We currently field issues through our bioBakery Discourse Support Forum.
If you would please post the issue to discourse we would be happy to sync up with you to get it resolved.