cjneely10/EukMetaSanity

Installation instructions

Closed this issue · 4 comments

Thought I would start collating some issues as I went through the download. @akrinos might also add as I know she ran into some weirdness with genemark installation (I already had it set up).

I ran the download script as follows: ./download-data.py -t 8 -m 30gb data

And got the following error:

Stderr:  | Error in argument regex --split-memory-limit

The issue appears to be the gb. I think that it would help to include the formatting info for -m flag in the command (i.e. G for gigabyte etc.) .

Also, when I re-ran the command it didn't resume with mmseqs index build where it had broken. I went back and ran the install manually.

This makes me think that it would be good to add a test to make sure databases are downloaded / installed properly.

Good idea - I will include formatting info for the -m flag.

Currently the download-data.py script has a -r flag that should overwrite previously downloaded data (complete or failed). But I definitely think tests for proper download are a good idea. I will add some tests to the download-data.py script.

Sorry for taking so long to get back to you on this! A few notes on things that I noticed when naïvely installing it that might be problem areas for users:

  • When you create the conda environment, the errors that I had reported with my Genemark installation did not prevent the conda environment from being built. The result is that you end up with a conda environment called EukMS, but without having all the software you actually need to run the pipeline. If users just try to run INSTALL.sh again, it looks mostly like there are no issues, other than a CondaValueError that tells them that the conda environment is already there, even though the environment was not totally successful at first build. I totally understand how challenging it is to address these kinds of issues though!
  • The fact that in the instructions you're meant to just copy/paste the change directory into EukMetaSanity might be a point of confusion for novice users, since there is also a subdirectory of the same name in the git cloned directory. I think it might help to be a bit more explicit in describing the directory structure, rather than only the copy/paste commands provided for navigating the file structure (which are helpful!)
  • Indeed, the GeneMark install requires both downloading the software and the license. The messages you get about this are slightly cryptic if you've never used GeneMark before, and then there aren't explicit instructions on the INSTALL.md page about what specific requirements there are for Genemark, and how GeneMark's license file works (at least a link would be helpful for that)

Overall great instructions, though! More soon.

@akrinos Thank you for this, and sorry for the late response:

  • GeneMark unfortunately must be installed outside of the conda environment (and by the user specifically, since they must accept their licensing agreement). I am working on a script for users to run after installing that will validate their installation and let them know what parts failed. Also, addressing your third point, I'll add more instructions about GeneMark installation for new users.

  • That is a good idea, I'll include the directory structure.

Thanks for the feedback!