Wittelab/orchid

Problem with ./make_database.sh

Closed this issue · 18 comments

N E X T F L O W ~ version 0.23.4
Launching /export/home/craig/orchid/workflow/annotate.nf [elated_banach] - revision: fe1e111a6e
======================== Run Info ==================================================
Database: mysql://root:orchid123@localhost:3306/feature_test
Mutations: 41
Number of chunks / process: 1

[warm up] executor > local
[b7/afe46e] Submitted process > makeTabixes (splitting data)
[36/cb64ee] Submitted process > makeBeds (splitting data)
[73/b446ec] Submitted process > updateMetadata (saving feature info)
WARN: Process makeBeds (splitting data) terminated with an error exit status (1) -- Execution is retried (1)
[76/a6ef5b] Re-submitted process > makeBeds (splitting data)
ERROR ~ Error executing process > 'makeBeds (splitting data)'

Caused by:
Process makeBeds (splitting data) terminated with an error exit status (1)

Command executed:

mysql -u$MYSQL_USER -h$MYSQL_IP -P$MYSQL_PORT -D$MYSQL_DB -NB -e "SELECT CONCAT('chr',chromosome), start-1, end, ssm_id, '' FROM ssm" > variants.bed
sort-bed variants.bed > sorted_variants.bed

Command exit status:
1

Command output:
(empty)

Command error:

*****ERROR: Unrecognized parameter: variants.bed *****
I've tried entering the command by hand and get ERROR 1049 (42000): Unknown database 'orchid_20171215'

Thanks!

It looks like this issue could be caused by a couple potential problems:

  1. Orchid cannot connect to the database. When you ran make_database.sh did the output of the reset.nf or populate.nf workflow indicate any error? For database-related errors, I like to use SQL Pro (for OSX) to manually check orchid's tables. It takes the same database information as in your config script, and is a great tool if you haven't come across it before!

  2. There could be a problem with the sort-bed command. When you type this command in the terminal, does it work and if so, what version is being displayed?

In the config file (config, which points to workflow/nextflow.config), there are several lines that will define the database connection across all orchid-db processes. It looks like this:

// Database parameters
params.database_ip             = 'localhost'           // The URL of the database which to write data
params.database_port           = 3306                  // The database port (3306 for MySQL/MemSQL, 5432 for PostgreSQL)
params.database_username       = 'orchid'              // The username to access the created database
params.database_password       = 'orchid_flower'       // The password to access the created database

Have you tried modifying these to match your mysql setup? Deeper in the config file (around line 100), you'll see that these parameters are used to generate several nextflow environment variables, which set shell environment variables used throughout the orchid-db workflow.

Let me know if properly setting these resolves the problem.

Thanks for the clarification, I think I understand the issue better now.

There is a quark with mysql where passwords can be specified with either -p or --password= on the command line, but only the latter will work for blank (or nonexistent) passwords in non-interactive settings.

When creating a database, orchid will attempt to use a root/no password login to create a new user with the credentials specified in the config file for the specified table, so you can use any password you'd like without revealing more privileged account information. If you set up a user for orchid manually with a blank password it could potentially cause problems due to this quark.

Moving forward I'd suggest two things:

  1. Try providing a blank password in the config file: params.database_password = ''
  2. Specify a generic password in the config file and let orchid create the database user instead of doing so manually.

If this doesn't work or you'd really prefer a blank password, let me know. I'm thinking it may be better to switch all the mysql flags to their long forms in the code base anyway, both for readability and to avoid strange quarks like this.

Appreciate your patience! It totally makes sense to include a password flag for mysql across the entire codebase. I'll update this now and will post a fix shortly.

OK, my latest commit should help a lot. Earlier I thought you'd removed the password flags from the annotation.nf script for your database situation, so I was confused until I realized they were never in the script to begin with! The complete set of flags really should have been provided to every mysql command originally. My database set up is tolerant to this bug but I think your situation will be one of the most common so this issue has been useful for making the orchid code better. Thanks for reporting it!

The 'feature test' is working for me with the new code changes. Let me know if this commit fails to resolve your issue. You may also run into snpEff, hg19, or simulator problems since I am unable to provide these code/data externalities due to size or licensing restrictions. The download.nf script is meant to help with obtaining them, but let me know if you run into trouble.

I wonder if there is an issue with the sort-bed command. If you do sort-bed --version, what version does it respond with? It should be 2.4.26.

Another thing to try if you haven't already is to navigate into nextflow's working directory of the failing process to take a peek at the execution environment. The error message should report the working directory when nextflow crashes (looks like work/6f/fdba0ad4fd980d1161cc80e8fe1f22). Once you navigate to it you can check variants file, the environment variables, and the executed command:

head variants.bed
cat .command.env
cat .command.sh

The variants file should exist and be non-empty and the mysql environment variables in .command.env should all be defined and correspond to those in .command.sh

You can also source the environment file and run the script directly to see if error information is more informative that way:

source .command.env
sh .command.sh

Ah, there seems to be an issue with the install instructions, which I'm now re-testing in a new virtual machine. There should be an additional requirement for bedops 2.26, which provides sort-bed. Here's how you get it:

wget https://github.com/bedops/bedops/releases/download/v2.4.26/bedops_linux_x86_64-v2.4.26.tar.bz2
tar jxvf bedops_linux_x86_64-v2.4.26.tar.bz2
sudo mv bin/* /usr/local/bin

This assumes a 64-bit unix system and that /usr/local/bin is in your $PATH.

There is also an issue with long Mutation IDs in the version of mysql for Ubuntu 14.04, which can be fixed by running the SQL command:

SET @@global.sql_mode= 'NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION';

I'll provide another update tomorrow to fix any remaining issues in the completely clean install of orchid in a virtual machine.

It looks like the latest commit has fixed remaining issues, as demonstrated through a successfully created 'feature_test' database after a clean install. I also tested the bed file generation with the dnase feature found on our lab site.

The install have also been updated to reflect changes, but I think we already addressed them all in this bug history, so your installation should be ready to go.

Glad to hear. I'll mark this issue as closed, but please let me know if anything else comes up!