There are two problems when running MiCall

Question

There are two problems when running MiCall

Closed this issue 5 years ago · 18 comments

Hello, I have two questions ,

1.After adding the file ErrorMetricsOut.bin, the operation error occurs, and the prompt is as shown in the figure:

2.There is no problem when running some .fastq.gz files, but other .fastq.gz files have errors when running remap, the prompts are as follows:

Hope to answer, thank you very much!

Answer 1 · 2019-09-05T01:53:10.000Z

Hi, thanks for trying out MiCall-Lite and reporting this issue!
Can you please let me know what version of Ubuntu you're running, and version of Python?
Are you able to send me the files so that I can try to reproduce the problem?

Answer 2 · 2019-09-05T02:36:03.000Z

Ok, the first error seems to be a bug in censor_fastq.py where 'N' should be prefixed with a b.
The second error is more unusual. Can I ask what you're sequencing?

Answer 3 · 2019-09-05T02:52:15.000Z

Hi,
the version of Ubuntu:Ubuntu16.04 LTS;
the version of Python:Python 3.5.2;
The file is too large and was not uploaded successfully on github.

Answer 4 · 2019-09-05T03:01:25.000Z

Current sequencing :
The drug resistance mutations during antiretroviral therapy (ART) discontinuation in AIDS patients starting ART at early stage of National Free ART Program.

Answer 5 · 2019-09-05T03:03:08.000Z

@Anastasiamiaomiao - sorry for the misunderstanding. I wasn't suggesting that you upload FASTQ files to GitHub, as that would certainly exceed upload limits. I would have proposed giving you temporary access to one of my servers for an sftp transfer, or I can show you how to extract a short portion of a FASTQ file to send via e-mail.

Answer 6 · 2019-09-05T03:10:24.000Z

OK
Please tell me how to extract a short portion of a FASTQ file.
Thanks.

Answer 7 · 2019-09-05T03:12:01.000Z

@Anastasiamiaomiao, can you please run the following commands:

# stream the first 4000 lines of uncompressed FASTQ to new files
gunzip -c /home/zoe/dlty/10_S10_L001_R1_001.fastq.gz | head -n4000 > test1.fastq
gunzip -c /home/zoe/dlty/10_S10_L001_R2_001.fastq.gz | head -n4000 > test2.fastq

and then e-mail me (apoon42@uwo.ca) so I can send you a temporary username and password to my server? Then we can arrange a secure file transfer. I will also need the ErrorMetricsOut.bin file.

Answer 8 · 2019-09-05T03:45:12.000Z

I haven't been able to reproduce the problem with the shortened files.

Answer 9 · 2019-09-05T05:35:51.000Z

After the update, I tried to run it. Another error has occurred.

micall -i /home/zoe/dlty/ErrorMetricsOut.bin /home/zoe/dlty/10_S10_L001_R1_001.fastq.gz /home/zoe/dlty/10_S10_L001_R2_001.fastq.gz

MiCall-Lite running sample 10_S10_L001_R1_001...
Censoring bad tile-cycle combos in FASTQ
Traceback (most recent call last):
File "/usr/local/bin/micall", line 156, in
run_sample(args)
File "/usr/local/bin/micall", line 112, in run_sample
args = censor_fastqs(args, prefix)
File "/usr/local/bin/micall", line 95, in censor_fastqs
use_gzip=not args.unzipped)
File "/usr/local/lib/python3.5/dist-packages/micall/core/censor_fastq.py", line 88, in censor
dest.write('#' * bad_count)
File "/usr/lib/python3.5/gzip.py", line 258, in write
data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'

Answer 10 · 2019-09-05T14:56:59.000Z

Thanks for securely transferring the data. I've been able to reproduce both errors and am debugging now.

Answer 11 · 2019-09-05T16:29:20.000Z

Ok, I've patched the master branch and made a successful run:

art@orolo:~/git/micall/issue25$ micall -i ErrorMetricsOut.bin 10_S10_L001_R1_001.fastq.gz 10_S10_L001_R2_001.fastq.gz 
MiCall-Lite running sample 10_S10_L001_R1_001...
  Censoring bad tile-cycle combos in FASTQ
  Preliminary map
  Iterative remap
  Generating alignment file
  Generating count files

Based on your interop file, this run has some fairly high error rates (colour indicates % error rate in phiX174 control; plot generated using Python and R scripts published here):

so you should be cautious about how you interpret these data. Due to these high error rates, analyzing the FASTQ files triggered a subroutine in MiCall-Lite that had never been properly tested. I found the bug that caused a TypeError exception to be thrown.

Here is a summary of the nuc.csv output file:

Can you please pull the latest version from the master branch and try running it on your data?

Answer 12 · 2019-09-06T05:15:55.000Z

Thank you, I have been able to run successfully.
However, I still have a question to ask.

How to draw a coverage plot with an R script based on file nuc.csv?

Answer 13 · 2019-09-06T07:38:42.000Z

2.The command python3 batch-micall.py /home/zoe/dlty/ does not consider the ErrorMetricsOut.bin file, which location should it be added to?

Answer 14 · 2019-09-06T16:29:07.000Z

2.The command python3 batch-micall.py /home/zoe/dlty/ does not consider the ErrorMetricsOut.bin file, which location should it be added to?

Actually, the batch-micall.py script should accept the same arguments as micall, i.e.,

art@orolo:/data/miseq/integrase$ python3 ~/git/micall/bin/batch-micall.py . -i ~/git/micall/issue25/ErrorMetricsOut.bin
Detected 523 gzipped FASTQ files at .
./DR-88-13-2_S180_L001_R1_001.fastq.gz
MiCall-Lite running sample DR-88-13-2_S180_L001_R1_001...
  Censoring bad tile-cycle combos in FASTQ

Answer 15 · 2019-09-06T17:15:28.000Z

While working on the micall script, I realized that you should be modifying the --readlen setting because you're using a reagent kit with slightly longer reads - MiCall defaults to 251 nt.

Answer 16 · 2019-09-06T20:13:54.000Z

Revised master branch, micall will now take --batch or -b argument for batch processing all FASTQ files in user-specified directory.

Answer 17 · 2019-09-08T19:37:01.000Z

Added coverage.R script at micall/utils/:

Elzar:micall artpoon$ micall/utils/coverage.R examples/Example_S1_L001_R1_001.amino.csv output

This generates a PDF file for every seed and region combination in the CSV output. In the case of this example file, there are three PDFs (for the regions INT, PR and RT, respectively):

This script automatically detects whether you are dealing with nucleotide or amino acid CSV outputs (*.nuc.csv or *.amino.csv, respectively).

Answer 18 · 2019-09-09T02:19:15.000Z

Ok, Thank you very much! | | Miaomiao Li | | miaomiao_ana@163.com | 签名由网易邮箱大师定制 On 9/7/2019 00:29，Art Poon<notifications@github.com> wrote： 2.The command python3 batch-micall.py /home/zoe/dlty/ does not consider the ErrorMetricsOut.bin file, which location should it be added to? Actually, the batch-micall.py script should accept the same arguments as micall, i.e., art@orolo:/data/miseq/integrase$ python3 ~/git/micall/bin/batch-micall.py . -i ~/git/micall/issue25/ErrorMetricsOut.binDetected 523 gzipped FASTQ files at ../DR-88-13-2_S180_L001_R1_001.fastq.gzMiCall-Lite running sample DR-88-13-2_S180_L001_R1_001... Censoring bad tile-cycle combos in FASTQ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.