raw-lab/mercat2

Can't perform the analysis

Luponsky opened this issue · 7 comments

Hello, your tool seems fantastic, but I can't use it properly. 
I have some metagenomic reads samples (forward and reverse) that I have merged to have just one fasta file for each sample. 
I wanted to calculate the alfa diversity with Mercat2 to compare the samples with a kmer length of 31. 
But it seems that my workstation can't afford the computational effort. 
For some samples, it works, but when it starts with a 2GB sample, it complains that a "worker died" and the analysis fails. 
Do you have any recommendations? 

Thank you for your kind words.
How did you merge your fastq data?
If you can give us a small test file we can run to double check.
Also, can you provide the error message as well.

Thank you for using mercat2!

Hello, Thanks for the quick answer!
I have concatenated the R1 and R2 with cat for each sample:
cat sampleA_R1.fa sampleA_R2.fa > sampleA.fa
Do you have other suggestions to run Mercat2 with for paired end metagenomic fasta files?

Okay I'm sending to Jose Luis Figueroa one sample.
I have to run again the command, I will post here the error ASAP.
Many thanks,
L

These are reads or contigs? It would be best to have the fastq (fasta + qual) so we can do some trimming and QC prior running.
You can interleave them R1/R2. It would likely be better to avoid double counting by merging reads that overlap then work with those. Do you know if your insert would have overlaps? If they don't have overlaps I would interleave them R1 -- R2, R1 -- R2. We have a script to do this we can add it to libs for you. The new mercat2 provides an PCA for you at the end.

Do you have the fastq's?
It would be good to see the error message.

Welcome. We will get this figured out for you asap.

Thank you for using MerCat2.

Also, are these 16S or shotgun metagenomics?

Thank you for using MerCat2

Hello,
I have already denoised the reads, they are from a shotgun metagenomic run with illumina hiseq.

this is the error:

`mercat2.py -i mercat/BACT_ALL_READS.fasta -k 31 -n 20 -c 10
2022-10-21 17:58:36,412 INFO worker.py:1509 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
Loading files
Checking for large files
Processing Nucleotides
Running mercat using 20 cores on BACT_ALL_READS_nucleotide
2022-10-21 18:22:01,003 WARNING worker.py:1829 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: 6ba565a3639f3f1ec9f86f3e4dd5686c441fcaf001000000 Worker ID: 7b47f3e4ee2dab22bd49f565e37b4fbcde3a7e1a5bda5158a6a6f449 Node ID: 4846aea0de2bafed6c5b9ea1587e7c0df3f4368dc53221151c1b7ce0 Worker IP address: 130.251.115.16 Worker port: 33979 Worker PID: 3527230 Worker exit type: SYSTEM_ERROR Worker exit detail: The leased worker has unrecoverable failure. Worker is requested to be destroyed when it is returned.
2022-10-21 18:22:10,433 WARNING worker.py:1829 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: 48d268648cab99c7c729eb9f5f0c7794e7cc605801000000 Worker ID: 6f41b2a2970df257e2e09d1535b493f4c0c0ec7edf362ee46be69968 Node ID: 4846aea0de2bafed6c5b9ea1587e7c0df3f4368dc53221151c1b7ce0 Worker IP address: 130.251.115.16 Worker port: 44371 Worker PID: 3527235 Worker exit type: SYSTEM_ERROR Worker exit detail: The leased worker has unrecoverable failure. Worker is requested to be destroyed when it is returned.
Traceback (most recent call last):
File "/home/userbio/miniconda3/envs/mercat2/bin/mercat2.py", line 324, in
mercat_main()
File "/home/userbio/miniconda3/envs/mercat2/bin/mercat2.py", line 241, in mercat_main
for k,v in ray.get(ready[0]).items():
File "/home/userbio/miniconda3/envs/mercat2/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/userbio/miniconda3/envs/mercat2/lib/python3.10/site-packages/ray/_private/worker.py", line 2275, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RayOutOfMemoryError): ray::countKmers() (pid=3527231, ip=130.251.115.16)
ray._private.memory_monitor.RayOutOfMemoryError: More than 95% of the memory on node workstation-biologia is used (60.48 / 62.78 GB). The top 10 memory consumers are:

PID MEM COMMAND
3527228 3.39GiB ray::countKmers()
3527241 3.23GiB ray::countKmers()
3527235 3.18GiB ray::countKmers()
3527240 3.17GiB ray::countKmers()
3527233 3.13GiB ray::countKmers()
3527234 3.12GiB ray::countKmers()
3527225 3.06GiB ray::countKmers()
3527239 3.05GiB ray::countKmers()
3527220 3.05GiB ray::countKmers()
3527218 3.04GiB ray::countKmers()

In addition, up to 0.31 GiB of shared memory is currently being used by the Ray object store.

--- Tip: Use the ray memory command to list active objects in the cluster.
--- To disable OOM exceptions, set RAY_DISABLE_MEMORY_MONITOR=1.
`
I think is a memory error, in this case do you know if there is a possibility to run it in a lightly way?
Yes, the R1 and R2 should overlap.

Perfect for the PCA

Thanks,
L

Thank you for using MerCat2.
We were able to reproduce your error on a laptop and can confirm that it is indeed a memory issue. We have a new version coming out that we will push to Anaconda in the this week which has several improvements, including reducing the memory requirements.

Due to the parallel nature of MerCat2, it does use more memory based on the number of CPUs used. Changing this value with the '-n' flag will help reduce the memory required while running, although naturally increasing the running time. Reducing the chunk size and k-mer size also usually reduces the amount of memory required, however this will affect the accuracy of the results.

We completed your analysis for you. Can you send us your email?