Metagenome Assembly and SqueezeMeta Configuration

Question

Metagenome Assembly and SqueezeMeta Configuration

Closed this issue 4 months ago · 2 comments

Hi everyone,

I am using SqueezeMeta to analyze my metagenomes and utilizing MEGAHIT as the assembler on my local computer, as we don't have resources for a cluster. I have 6 metagenomes, but I can't process them in co-assembly mode due to limited RAM (32GB). However, I can process up to 4 samples at a time. I’ve opted to perform the assembly in Galaxy in either co-assembly or sequential mode.

My question is about how to provide this co-assembly in the SqueezeMeta command:

1- If I do the sequential assembly and have a single .fasta file for the R1 and R2 pair of one sample, should I provide the path to this single file for both R1 and R2 of that sample in SqueezeMeta?

2- If I use the co-assembly mode and get a single .fasta file containing all 6 samples, how should I provide this file in SqueezeMeta?

Thank you very much for your help, and I look forward to your response.

Answer 1 · 2024-07-10T07:43:20.000Z

Hello

Copying the section of the manual regarding the samples file:

The samples file specifies the samples, the names of their corresponding raw read files and the sequencing pair represented in those files, separated by tabulators.
It has the format: <pair1|pair2>
An example could be
Sample1 readfileA_1.fastq pair1
Sample1 readfileA_2.fastq pair2
Sample1 readfileB_1.fastq pair1 noassembly
Sample1 readfileB_2.fastq pair2 noassembly
Sample2 readfileC_1.fastq.gz pair1
Sample2 readfileC_2.fastq.gz pair2
Sample3 readfileD_1.fastq pair1 nobinning,extassembly=/path/to/extassembly
Sample3 readfileD_2.fastq pair2 nobinning,extassembly=/path/to/extassembly

The first column indicates the sample ID (this will be the project name in sequential mode), the second contains the file names of the sequences, and the third specifies the pair number of the reads. A fourth optional column can take the following comma-separated values:
"noassembly" indicates that these samples must not be assembled with the rest (but will be mapped against the assembly to get abundances). This is the case for RNAseq reads that can hamper the assembly but we want them mapped to get transcript abundance of the genes in the assembly.
"nobinning" value can be included in order to avoid using those samples for binning.
"extassembly" indicates the external assemblies for each of the samples, if they are to be used. This allows to specify different external assemblies for each sample when running in sequential mode. For coassembly, merged or seqmerge modes, use the SqueezeMeta -extassembly option instead (see Section 5).

Then, for your option 1 (sequential), add the corresponding external assembly in the samples file using the extassembly flag.
For your option 2 (coassembly), just run SqueezeMeta as usual, but add -extassembly "/path/to/extassembly" in the command line-

Best,
J

Answer 2 · 2024-07-11T18:51:42.000Z

Hi, many thanks currently run the project with extassembly provided