qza_to_phyloseq may ignore first row of sample metadata if it is formatted according to Qiime2's documentation

Question

qza_to_phyloseq may ignore first row of sample metadata if it is formatted according to Qiime2's documentation

Closed this issue 5 years ago · 3 comments

Here it says that the metadata should be "a qiime2-compliant TSV metadata file". However, when I import my compliant metadata file, the resulting sam_data uses the 3rd row (containing information about my first sample) as the "header" row, making it into column names when it should not be. This seems to be due to the fact that the identifier column can be named starting with a "#", according to the qiime2 documentation, and that the second row, which has information about the type of data in the columns (numeric or categorical), also starts with a "#". Removing the second row and renaming the identifier column to omit the "#" allows proper column naming, but it seems like there needs to be some sort of warning about this behavior, especially because naming the identifier column with a "#" is permissible.

ETA: Leaving the second row as-is with the "#" would work as well, because it's not useful data to have in the resulting sam_data data.frame; I just deleted it when I was playing around with trying to import my data correctly.

Answer 1 · 2019-03-26T21:48:04.000Z

Thanks for identifying this issue. I apparently did not do adequate vetting on this new functionality as it should be able to handle any qiime compliant metadata file. I will try to fix this up in the near future. In the meanwhile, you can always import the objects separately and build the phyloseq object yourself.

Answer 2 · 2020-02-21T21:44:51.000Z

I forgot to update this, I added a read_q2metadata() function for read these these tables!

Answer 3 · 2020-07-20T14:55:26.000Z

The function read_q2metadata() does not work for metadata tables without a second row (error: Metadata does not define types (ie second line does not start with #q2:types)).

Traditional qiime format metadata tables required the SampleID column as "#SampleID". Is there a way to make qza_to_phyloseq() read rownames allowing first column rowname to begin with "#"?