novoalab/nanoRMS

getting error at "RNA modification stoichiometry estimation using Tombo resquiggling"

septav opened this issue · 4 comments

Hi:)

I am interested in trying nanoRMS "RNA modification stoichiometry estimation using Tombo resquiggling" on my data but I am new here and I am having some issues that I would be thankful if you could help me with.

  1. I started with downloading test data:
    Is this exactly the command that I use?
    (cd per_read && wget https://public-docs.crg.es/enovoa/public/lpryszcz/src/nanoRMS/per_read/guppy3.0.3.hac -q --show-progress -r -c -nc -np -nH --cut-dirs=6 --reject="index.html*")
    Here is what I get when I type in while I am in nanRMS folder:

Screen Shot 2021-01-20 at 6 39 10 PM

2. When I was not successful above, I used this part of command: wget https://public-docs.crg.es/enovoa/public/lpryszcz/src/nanoRMS/per_read/guppy3.0.3.hac -q and I did not get any error. But I could not find and fast5 to use as the test in the guppy3.0.3.hac file. Is this because I did not transfer the data properly?
  1. When I could not find any .fast5 to test, I used one of my own .fast5s and reference. Here is the commands that I used:
    per_read/get_features.py --rna -f GRCh38.p10.genome.fa -t 6 -i FAL60812_pass_b3a7a5d6_9.fast5
    But I keep getting this error:

Screen Shot 2021-01-20 at 6 45 50 PM

How can I solved this problem?

Thanks you so much in advance:)

Hi @septav, first of all thanks for the interest in our work.

  1. this is a problem with your wget/system environment (try to google the error message). you can try the same command without --show-progress
cd per_read && wget https://public-docs.crg.es/enovoa/public/lpryszcz/src/nanoRMS/per_read/guppy3.0.3.hac -q -r -c -nc -np -nH --cut-dirs=6 --reject="index.html*"
  1. yes, the files didn't download properly. see 1. if download doesn't work, alternatively you can download the files on your laptop/workstation and transfer them to the cluster.
  2. Does your Fast5 file contain basecall information? Please send me the output of
h5ls -r FAL60812_pass_b3a7a5d6_9.fast5 | head

Thank you so much for your response @lpryszcz!

  • Thanks for your suggestion. I transferred the files to my computer and then moved them to the cluster. It looked like I was able to make a "predictions_ncRNA_WT30C_WT45C.tsv.gz.bed.tsv.gz" that is similar to the output provided.

  • Here is the output of h5ls -r FAL60812_pass_b3a7a5d6_9.fast5 | head command:

Screen Shot 2021-01-25 at 5 02 44 PM

Now I am trying to do the above steps with my own data. So here are my new questions:

  1. In the second step, it is mentioned that I will need to provide candidate positions. But I am not sure how and where can I enter the position that I am interested in. (for example if I am interested to check chrX: 1234567 where should I use these information.)
  2. In the # prepare BED step, .tsv.gz is an input. How can I make such file for my data to use in this step.
  3. In the next step, WT30C and WT45C are use as a control and sample. Is WT30C the control and WT45C the sample here?

Again I really appreciate your time and help.

Best,
Sepideh

  1. this is described in the README.md - you need to create BED file containing positions you want to test.
  2. this file was generated in the earlier steps. again, see README.md
  3. this depends on the experiment, in this case WT30C is a control (since we want to identify modified positions in 45C). If you have a KO or KD, this will be your control.

Hi @septav, I have now added the following edits to the README to clarify where the BED file comes from:
"Note, you'll need to provide candidate positions that are likely modified. Those were identified earlier -- please see above section 1.2. Predict RNA modifications. so here we'll just generate BED file from existing candidate file."

So the list of candidate sites that you need is generated in the previous step. Please let us know if this solves your issues, thanks!