Oshlack/necklace

de novo assembly fasta, reads_R2 error

Opened this issue · 20 comments

hello,

I want to run necklace using de novo assembly fasta files, I am running necklace with the -p option and I have created the configuration file with the path to the de_novo_assembly_files and I am repeatedly getting an error:
"A variable referred to in your script on line 27, 'reads_R2' was not defined."

How do I circumvent this?

Hi ,
You need to specify the short reads paths, reads_R1 and reads_R2, in the contig file, so it can perform genome-guided assembly and count reads mapping to genes (even if you've already done the de novo assembly). If your data is single-end you can just set:
reads_R2=""
If you are doing this and still get an error please send me your config file and the error output.

Cheers,
Nadia.

thanks for your timely response.

Is it necessary to have the reads? Could I run necklace with "dummy" fastq's? I am having trouble finding the reads as the de novo assemblies were done several years ago...

You could try dummy fastqs although I suspect completely empty files might cause a few errors in Necklace, so best to provide some real or simulated reads from the same organism (but doesn't need to be many samples or high coverage).

I got a hold of the fastq files, and now I am getting a different error referring to my config file:

expecting EOF, found ','

that is referring to a comma present because I am including more than one de novo assembly file
my script looks like this:

// de_novo_assembly_file="/sf6/xxx.BlastRef.fa,/sf6/xxx.BlastRef.fa,/sf6/ ....etc"

Hi,

Yes you can only pass one de novo assembly file to necklace, but a simple way to get around this is to join all the assemblies into one file. e.g. with "cat /sf6/*.BlastRef.fa > all_assemblies.fasta"
This should work provided that none of the individual assemblies have the same contig ID as each other.

Cheers,
Nadia.

hello-

I joined the assemblies and tried again. I am now getting errors like:
A variable referred to in your script on line 3, 'sf6' was not defined. or
A variable referred to in your script on line 3, 'all_assemblies' was not defined.

I am not sure why it is not recognizing the path or the name of the assembly file for what it is?

Thanks for your continued help

Hi, did you put quotes around the file name? Can you send the parameter you set?

I did put quotes..
here is the command I run:
/users/sf6/data/necklace-1.11/tools/bin/bpipe run -p /sf6/data/necklace-1.11/necklace.groovy necklace.txt

and my config file:
// sequencing data
reads_R1="SRR_1.fastq"
reads_R2="SRR_2.fastq”// de_novo_assembly_file="all_assemblies.fasta” //The genome and annotation genome=“GCA_000genomic.fa” //The genome and annotation of a related species genome=“GCF_000_genomic.fa” //

The file has no spaces because I also got error messages about new lines before.

Hi,
Can you try uncommenting the de_novo_assembly_file part in your config file:
// sequencing data
reads_R1="SRR_1.fastq"
reads_R2="SRR_2.fastq”
de_novo_assembly_file="all_assemblies.fasta” //The genome and annotation genome=“GCA_000genomic.fa” //The genome and annotation of a related species genome=“GCF_000_genomic.fa” //

Then run the necklace command line this:
/users/sf6/data/necklace-1.11/tools/bin/bpipe run /sf6/data/necklace-1.11/necklace.groovy necklace.txt
The -p is only needed if you specify an argument after it. e.g.
-p de_novo_assembly_file="all_assemblies.fasta”. I suspect this is what's causing you errors.

If you still get an error running with the suggestions above, you try adding the full path of the de novo assembly file.

Let me know if this helps.

Cheers,
Nadia.

I've incorporated our suggestions, the error I receive now is:
Could not understand command run /users/sf6/data/necklace-1.11/necklace.groovy or find it as a file

I've tried reinstalling necklace and this time within the sf6 directory and with the same command as before as well as shortening the path, I get the same error. Are you familiar with this?

thanks again for your help!!

No I've never seen an error like this before, justt o confirm, did you run the full command (including the bpipe at the start?)

/users/sf6/data/necklace-1.11/tools/bin/bpipe run /sf6/data/necklace-1.11/necklace.groovy necklace.txt

What happens if you just run:
/users/sf6/data/necklace-1.11/tools/bin/bpipe
Does it print usage information?

Another idea is you provide the full path to necklace.txt. You could also try providing the full path to the files inside your config file if you continue to get errors (although I don't think this is causing the current problem).

Hopefully we can work this out and get Necklace running. Thanks for your patience.

Cheers,
Nadia.

Hi Nadia,

yes, I had run the full command. I re-installed the demo data and started from the beginning with everything in the same directory. I was able to successfully run Necklace on the demo data using ./necklace-1.11/tools...etc.
I've tried the same now with my own config file. The new error is:
WARN: Error evaluating script necklace.txt: No such property: assemblies for class: necklace

Pipeline Failed!

A variable referred to in your script on line 27, 'reads_R2' was not defined.

"assemblies" refers to the de novo assembly fasta's that I joined to "assemblies.fasta"
Nothing has changed in my config file but I get the same error whether I comment or uncomment the de_novo_assembly_file part.

thankyou!

Did you put quotes around the filename, ie. "assemblies.fasta" in the config file? I just realised I had left these off in the wiki instructions, so apologies for that. Quotes are required and your error sounds a bit like they are missing.

I did put quotes around all of the file names. I've looked over my config file, I've retested. I don't get the warning about "no such properties". I do still get and error stating:
A variable referred to in your script on line 27, 'reads_R2' was not defined.

For reference here is what is in my config file again and it is all on one line:
// sequencing data reads_R1="SRR1138705_1.fastq" reads_R2="SRR1138705_2.fastq” //de_novo_assembly_file="assemblies.fa” //The genome and annotation genome=“GCA__genomic.fa” //The genome and annotation of a related species genome=“GCF__genomic.fa” //

Hmm, the variables all need to be on their own line in the config file (not one line). The "//" means it's a comment, so the code will interpret everything after as a comment and not process it as the location of all the files it needs. What was the reason again that you couldn't have everything on a separate line? What error do you get when you try that?

Here is the error I get:
WARN: Error evaluating script necklace2.txt: startup failed:
necklace2.txt: 3: expecting anything but ''\n''; got it anyway @ line 3, column 31.
reads_R2="SRR1138705_2.fastq”
^

1 error

Pipeline Failed!

A variable referred to in your script on line 27, 'reads_R2' was not defined.

Now, my variables are on their own line, the file looks like this:
// sequencing data
reads_R1="SRR1138705_1.fastq"
reads_R2="SRR1138705_2.fastq”
//
de_novo_assembly_file="assemblies.fa”

thanks again for taking the time to help with this!

That's strange, I haven't encountered an error like that before. Is that the full error that was printed? What command do you run? What system are you running on? Are you using bash shell? etc. It looks like it might be something to do with the environment to me.

I am using bash shell on a Mac. I wrote the config file using text edit and made it plain text. I will paste the command and error below:

Command: ./necklace-1.11/tools/bin/bpipe run ./necklace-1.11/necklace.groovy necklace2.txt
Error:
WARN: Error evaluating script necklace2.txt: startup failed:
necklace2.txt: 3: expecting anything but ''\n''; got it anyway @ line 3, column 31.
reads_R2="SRR1138705_2.fastq”
^

1 error

Pipeline Failed!

A variable referred to in your script on line 27, 'reads_R2' was not defined.

Please check that all pipeline stages or other variables you have referenced by this name are defined.

...unbelievable....I think the error was due to the quotations. In this line the last quotation mark is different from the first (reads_R2="SRR1138705_2.fastq”) and I'm not sure how I managed to do that but I was repeatedly getting an error there. So, I had a good laugh before the next error after fixing the quotes.

Error: Expected one or more inputs with extension 'gz' but none could be located from pipeline.

Should I zip my fastq files?

Thank you!

Wow... glad you finally found it. How frustrating.
The next error is much simpler. Just gzip your fastq files and it should be happy.