PoonLab/MiCall-Lite

There are two problems when running MiCall

Closed this issue · 18 comments

Hello, I have two questions ,

1.After adding the file ErrorMetricsOut.bin, the operation error occurs, and the prompt is as shown in the figure:
ERROR

2.There is no problem when running some .fastq.gz files, but other .fastq.gz files have errors when running remap, the prompts are as follows:
REMAP

Hope to answer, thank you very much!

Hi, thanks for trying out MiCall-Lite and reporting this issue!
Can you please let me know what version of Ubuntu you're running, and version of Python?
Are you able to send me the files so that I can try to reproduce the problem?

Ok, the first error seems to be a bug in censor_fastq.py where 'N' should be prefixed with a b.
The second error is more unusual. Can I ask what you're sequencing?

Hi,
the version of Ubuntu:Ubuntu16.04 LTS;
the version of Python:Python 3.5.2;
The file is too large and was not uploaded successfully on github.

Current sequencing :
The drug resistance mutations during antiretroviral therapy (ART) discontinuation in AIDS patients starting ART at early stage of National Free ART Program.

@Anastasiamiaomiao - sorry for the misunderstanding. I wasn't suggesting that you upload FASTQ files to GitHub, as that would certainly exceed upload limits. I would have proposed giving you temporary access to one of my servers for an sftp transfer, or I can show you how to extract a short portion of a FASTQ file to send via e-mail.

OK
Please tell me how to extract a short portion of a FASTQ file.
Thanks.

@Anastasiamiaomiao, can you please run the following commands:

# stream the first 4000 lines of uncompressed FASTQ to new files
gunzip -c /home/zoe/dlty/10_S10_L001_R1_001.fastq.gz | head -n4000 > test1.fastq
gunzip -c /home/zoe/dlty/10_S10_L001_R2_001.fastq.gz | head -n4000 > test2.fastq

and then e-mail me (apoon42@uwo.ca) so I can send you a temporary username and password to my server? Then we can arrange a secure file transfer. I will also need the ErrorMetricsOut.bin file.

I haven't been able to reproduce the problem with the shortened files.

After the update, I tried to run it. Another error has occurred.

micall -i /home/zoe/dlty/ErrorMetricsOut.bin /home/zoe/dlty/10_S10_L001_R1_001.fastq.gz /home/zoe/dlty/10_S10_L001_R2_001.fastq.gz

MiCall-Lite running sample 10_S10_L001_R1_001...
Censoring bad tile-cycle combos in FASTQ
Traceback (most recent call last):
File "/usr/local/bin/micall", line 156, in
run_sample(args)
File "/usr/local/bin/micall", line 112, in run_sample
args = censor_fastqs(args, prefix)
File "/usr/local/bin/micall", line 95, in censor_fastqs
use_gzip=not args.unzipped)
File "/usr/local/lib/python3.5/dist-packages/micall/core/censor_fastq.py", line 88, in censor
dest.write('#' * bad_count)
File "/usr/lib/python3.5/gzip.py", line 258, in write
data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'

Thanks for securely transferring the data. I've been able to reproduce both errors and am debugging now.

Ok, I've patched the master branch and made a successful run:

art@orolo:~/git/micall/issue25$ micall -i ErrorMetricsOut.bin 10_S10_L001_R1_001.fastq.gz 10_S10_L001_R2_001.fastq.gz 
MiCall-Lite running sample 10_S10_L001_R1_001...
  Censoring bad tile-cycle combos in FASTQ
  Preliminary map
  Iterative remap
  Generating alignment file
  Generating count files

Based on your interop file, this run has some fairly high error rates (colour indicates % error rate in phiX174 control; plot generated using Python and R scripts published here):
interop

so you should be cautious about how you interpret these data. Due to these high error rates, analyzing the FASTQ files triggered a subroutine in MiCall-Lite that had never been properly tested. I found the bug that caused a TypeError exception to be thrown.

Here is a summary of the nuc.csv output file:
coverage

Can you please pull the latest version from the master branch and try running it on your data?

Thank you, I have been able to run successfully.
However, I still have a question to ask.

  1. How to draw a coverage plot with an R script based on file nuc.csv?

2.The command python3 batch-micall.py /home/zoe/dlty/ does not consider the ErrorMetricsOut.bin file, which location should it be added to?

2.The command python3 batch-micall.py /home/zoe/dlty/ does not consider the ErrorMetricsOut.bin file, which location should it be added to?

Actually, the batch-micall.py script should accept the same arguments as micall, i.e.,

art@orolo:/data/miseq/integrase$ python3 ~/git/micall/bin/batch-micall.py . -i ~/git/micall/issue25/ErrorMetricsOut.bin
Detected 523 gzipped FASTQ files at .
./DR-88-13-2_S180_L001_R1_001.fastq.gz
MiCall-Lite running sample DR-88-13-2_S180_L001_R1_001...
  Censoring bad tile-cycle combos in FASTQ

While working on the micall script, I realized that you should be modifying the --readlen setting because you're using a reagent kit with slightly longer reads - MiCall defaults to 251 nt.

Revised master branch, micall will now take --batch or -b argument for batch processing all FASTQ files in user-specified directory.

Added coverage.R script at micall/utils/:

Elzar:micall artpoon$ micall/utils/coverage.R examples/Example_S1_L001_R1_001.amino.csv output

This generates a PDF file for every seed and region combination in the CSV output. In the case of this example file, there are three PDFs (for the regions INT, PR and RT, respectively):
output HIV1B-pol-seed INT

This script automatically detects whether you are dealing with nucleotide or amino acid CSV outputs (*.nuc.csv or *.amino.csv, respectively).