Gaius-Augustus/BRAKER

BRAKER on Conda Cloud

LironShv opened this issue · 32 comments

Hi,

Would it be possible to provide the newest version of BRAKER on the Anaconda cloud. Currently, BRAKER version 1.7 is available on the Anaconda Cloud: [https://anaconda.org/bioconda/braker].

Many thanks,

Hi,

Thanks for your quick reply, I will contact the maintainer.

Its a pitty its not up to data and that you don't have any influence over the your BRAKER package on conda. Its an easy way to install especially for people new bioinformatics.

Many Thanks and best wishes,

L

As I said, the conda recipe for BRAKER is not under our influence. However, I managed to set up a Perl environment that works for BRAKER and GeneMark with anaconda like this:

wget https://repo.anaconda.com/archive/Anaconda3-2018.12-Linux-x86_64.sh
bash bin/Anaconda3-2018.12-Linux-x86_64.sh # do not install VS (needs root privileges)
conda install -c anaconda perl
conda install -c bioconda perl-app-cpanminus
conda install -c bioconda perl-hash-merge
conda install -c bioconda perl-parallel-forkmanager
conda install -c bioconda perl-scalar-util-numeric
conda install -c bioconda perl-yaml
conda install -c bioconda perl-class-data-inheritable
conda install -c bioconda perl-exception-class
conda install -c bioconda perl-test-pod
conda install -c anaconda biopython
conda install -c bioconda perl-file-homedir
conda install -c bioconda perl-file-which # for augustus script running eval package only
cpanm Logger::Simple
# modify shebang in genemark scripts
cd gm_et_linux_64/gmes_petap/
for f in bet_to_gff.pl bp_seq_select.pl build_mod.pl calc_introns_from_gtf.pl change_path_in_perl_scripts.pl gc_distr.pl get_sequence_from_GTF.pl gmes_petap.pl histogram.pl hmm_to_gtf.pl make_nt_freq_mat.pl parse_by_introns.pl parse_ET.pl parse_gibbs.pl parse_set.pl predict_genes.pl reformat_fasta.pl reformat_gff.pl rescale_gff.pl rnaseq_introns_to_gff.pl run_es.pl run_hmm_pbs.pl scan_for_bp.pl star_to_gff.pl verify_evidence_gmhmm.pl;
do
cat $f | perl -pe 's/\/usr\/bin\/perl/\/usr\/bin\/env perl/' > $f.tmp
mv $f.tmp $f
chmod u+x $f
done

I compiled Augustus & Samtools outside of conda. BRAKER seems to run ok with that setting.

That's really great, I will have a try.

@KatharinaHoff , I needed a bioconda recipe for "braker2", so I created one, https://anaconda.org/bioconda/braker2.

You can see the full recipe here, https://github.com/bioconda/bioconda-recipes/tree/b8e30dd6e2fd9c8d44fb5f56e44cc528e5f46859/recipes/braker2. All the dependencies are listed in "meta.yaml". This package contains all mandatory and optional dependencies for "braker2" with exception of "GeneMark" and "GenomeThreader". Due to license restrictions, a conda package can not be created for these two tools, but the user can manually download them and install them in the "braker2" environment. A note about this, as well as how to set AUGUSTUS_CONFIG_PATH, is written in "post-link.sh" which is printed after the package is installed.

After installing the "braker2" package (conda create -n braker2-2.1.2 -c conda-forge -c bioconda -c defaults braker2=2.1.2) I ran few of the provided examples and they finished fine. Due to the many dependencies of "braker2" I would appreciate if you can test the package and let me know if you encounter any issues with it.

Natasha, you are awesome. Thank you! I will test it as soon as I find the time. (Teaching season just started for us...)

Hi @npavlovikj,

Thanks for making the conda recipe. I have done a quick bit of testing with UTR prediction enabled + genome threader (RNAseq + closely related proteins).

To get it work I had to modify the braker.pl script to look for bam2wig in the anaconda bin directory (for me this is /nbi/software/testing/anaconda3/5.2.0/x86_64/envs/braker2-2.1.2/bin/) rather than in auxprogs.

I changed this line:

$bam2wig = "$AUGUSTUS_BIN_PATH/../auxprogs/bam2wig/bam2wig";

To:

$bam2wig = "$AUGUSTUS_BIN_PATH/bam2wig";

Also, the recipe did not include a utrrnaseq binary that comes with the latest versions of augustus. I manually compiled this from the augustus github repo and copied to the anaconda bin dir and it is now working.

It is currently running and I will report back.

Tom.

@tommathers , many thanks for the feedback, I really appreciate it!

Regarding the "utrrnaseq" binary, looks like the Makefiles for "utrrnaseq" and "compileSpliceCands" were not copying these two binaries to the "bin" folder, and therefore they were not found in the environment. I opened a PR, Gaius-Augustus/Augustus#54, that addresses this issue. In the meantime, I will update the "augustus" conda recipe accordingly.

Once the "augustus" recipe is updated, I will fix the "braker2" recipe based on your suggestions. This should be done in few days, but if you notice any other issues with the "braker2" recipe, please let me know.

@tommathers , I actually had some time tonight, so I updated both "augustus" and "braker2" recipes in bioconda. Please update your "braker2" conda version (should be braker2=2.1.2=2), and let me know whether everything works for you now.

@npavlovikj, After installing braker2 from bioconda using conda create -n braker2 -c bio braker2=2.1.2=2, I get this error:

[gusev ~] conda activate braker2
(braker2) [gusev ~]$ braker.pl
"gtf2fasta" is not exported by the helpMod module
"clean_abort" is not exported by the helpMod module

How can I fix this?

@gusevfe , you shouldn't be having this issue... I just tried installing "braker2" in a new environment and everything worked fine. This error happens when "helpMod.pm" is not in the "bin" directory of the environment where the other "perl" scripts are. Can you please check whether "helpMod.pm" is there and that you are not using "perl" from a different environment?

If everything looks good to you, then I would suggest you to try and install "braker2" in a new environment with the proper channel order: conda create -n braker2 -c conda-forge -c bioconda -c defaults braker2.

I hope this helps, and please let me know how it goes.

@npavlovikj Sorry for a delayed response. I've just run conda create -n braker2 -c conda-forge -c bioconda -c defaults braker2, but the error persists.

@gusevfe , I am sorry, but I can not replicate your error - "braker2" works just fine for me. Please make sure that the used "perl" is in the same directory as "braker.pl" and "helpMod.pm", e.g.:

[npavlovikj@login.crane ~]$ conda create -n braker2 -c conda-forge -c bioconda -c defaults braker2
[npavlovikj@login.crane ~]$ source activate braker2
(braker2) [npavlovikj@login.crane ~]$ which perl
~/.conda/envs/braker2/bin/perl
(braker2) [npavlovikj@login.crane ~]$ which braker.pl
~/.conda/envs/braker2/bin/braker.pl
(braker2) [npavlovikj@login.crane ~]$ ls ~/.conda/envs/braker2/bin/helpMod.pm
/home/npavlovikj/.conda/envs/braker2/bin/helpMod.pm

What OS are you using? The commands from above work for me on Linux and OSX.

@npavlovikj , I am running ArchLinux.

The helpMod.pm file is in the correct place. But I noticed that I have file with the same name in the system: /opt/conda/pkgs/augustus-3.3.2-pl526h985c5e9_2/bin/helpMod.pm. Looking at the control sums, it seems to me that it is used as ~/.conda/envs/braker2/bin/helpMod.pm:

> md5sum /opt/conda/pkgs/augustus-3.3.2-pl526h985c5e9_2/bin/helpMod.pm /opt/conda/pkgs/braker2-2.1.2-2/bin/helpMod.pm ~/.conda/envs/braker2/bin/helpMod.pm
e31cba4bcb5cea9a1951c22412d32953  /opt/conda/pkgs/augustus-3.3.2-pl526h985c5e9_2/bin/helpMod.pm
ec50ddbfdc445b8a3fccfcaabbbf9a7f  /opt/conda/pkgs/braker2-2.1.2-2/bin/helpMod.pm
e31cba4bcb5cea9a1951c22412d32953  /home/gusev/.conda/envs/braker2/bin/helpMod.pm

I had the same issue.
I replaced helpMod.pm by command:

cp -f /opt/anaconda/pkgs/braker2-2.1.2-2/bin/helpMod.pm /opt/anaconda/envs/braker2/bin/helpMod.pm

it is working now. you need to correct path according to your installation

Does braker2 require python 3? I'm trying to use it with funannotate and I'm not able to install it.

(funannotate_env) -bash-4.1$ conda install -c bioconda braker2
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: |
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abor/
failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:



Package python conflicts for:
python=2.7
python=2.7
Package wheel conflicts for:
python=2.7 -> pip -> wheel
braker2 -> python[version='>=3'] -> pip -> wheel
Package biopython conflicts for:
braker2 -> biopython
Package setuptools conflicts for:
python=2.7 -> pip -> setuptools
braker2 -> python[version='>=3'] -> pip -> setuptools
Package msgpack-python conflicts for:
braker2 -> biopython -> mmtf-python -> msgpack-python
Package mmtf-python conflicts for:
braker2 -> biopython -> mmtf-python
Package pip conflicts for:
python=2.7 -> pip
braker2 -> python[version='>=3'] -> pip

@jolespin , braker2 is python=3 based, and funannotate is python=2, and thus the error you got. Since you can not have both packages in one environment, it will be best to create two separate conda environments.

@npavlovikj is correct. Some components of BRAKER require Python 3, specifically, that's getAnnoFastaJoingenes.py and make_hub.py.

You don't need Python 3 for BRAKER if you use it with the option --skipGetAnnoFromFasta, and if you do not use the option --makehub

Do you have any insight on what could be happening in this augustus issue?

Gaius-Augustus/Augustus#75

I don't think it's a memory issue b/c I've run it on a high memory server on a test data set from funannotate. I'm looking for a way to test my augustus installation.

@npavlovikj I have finally created a new release at https://github.com/Gaius-Augustus/BRAKER/releases/tag/v2.1.4 .

Notes on dependency changes:

  • cdbfasta/cdbyank are new dependency; they can be avoided by the option (--skip_fixing_broken_genes) - but that's not a good idea.

  • Dependency on Python3 has increased (should not pose a problem to Conda, I think you had already addressed that in your previous recipe, only mentioning it because of @jolespin funannotate question): without Python3, genes with spliced stop codons cannot be fixed. There is an option to disable that feature (see cdbyank/cdbfasta).

  • DIAMOND is not strictly required but can replace NCBI BLAST if you like; it's a good idea to decrease runtime.

  • This BRAKER version is not compatible with the current AUGUSTUS release but requires a git clone https://github.com/Gaius-Augustus/Augustus.git . Alternatively one can use the current AUGUSTUS release but at least update Augustus/scripts to the current github code. There might be an AUGUSTUS release in close future (@MarioStanke ?).

@npavlovikj A new AUGUSTUS release that is fully compatible with the current BRAKER release appeared at https://github.com/Gaius-Augustus/Augustus/releases/tag/v3.3.3 .

@npavlovikj A new BRAKER release is now available at https://github.com/Gaius-Augustus/BRAKER/releases/tag/v2.1.5. This release contains major changes. Most importantly for the bioconda recipe, BRAKER now heavily depends on ProtHint, available at https://github.com/gatech-genemark/ProtHint .

Thanks @KatharinaHoff ! The BiocondaBot should pick up the new release soon, and I will check the dependencies when that happens.

Hi @KatharinaHoff ,

The bioconda recipe has been updated to the newest release of BRAKER2. However, I was not able to add ProtHint as a dependency. The license of ProtHint doesn't allow redistribution, which is a necessity for creating conda packages. I opened an issue in ProtHint's repo regarding, gatech-genemark/ProtHint#6. In the meantime, ProtHint can be added in a similar way as GeneMarkS, once the environment is created. I updated the "braker2" recipe with a note mentioning the installation of ProtHint.

Please give the newest "braker2" package a try, and let me know if there are any issues with it.

Thank you,
Natasha

Dear @npavlovikj and @KatharinaHoff ,
many thanks for maintaining BRAKER resp. the conda recipe. I had a lot of trouble understanding why my conda installation of Braker2 did not work, but finally found that the shebang in Genemark's scripts forces the base installation of perl rather than perl from the conda environment, therefore loosing the dependencies. I realised late that Katharina has proposed to patch the shebang in Genemark's scripts at the top of this thread: "# modify shebang in genemark scripts"

But this comment is omitted on the main BRAKER2 site https://github.com/Gaius-Augustus/BRAKER

Furthermore, the conda recipe ( https://bioconda.github.io/recipes/braker2/README.html ) mentions that Genemark and ProtHint must be installed manually, but the need for this patch isn't mentioned as far as I saw.

Adding the hint to change the shebang in Genemark's scripts in both places might save future users a lot of time and worries.

Cheers, Mathias

Hello @npavlovikj,
and another one: the conda recipe for Braker2 is missing a Genemark dependency perl-math-utils. Works when installing this package through conda.
Mathias

Hello @npavlovikj,
and last but not least: the BRAKER2 version currently on conda (2.1.5-1) does not work with ProtHint 2.5.0, but it does work with ProtHint 2.4.0, see: #244

get ProtHint 2.4.0 here: https://github.com/gatech-genemark/ProtHint/releases/tag/v2.4.0

Cheers, Mathias

@alexlomsadze the conda recipe is something that I never managed to maintain. This is something where I could really use your help a lot!

For braker3 we (with @rlibouba) have worked on/used a conda recipe lately, it's been updated this morning actually: bioconda/bioconda-recipes#44296
Of course the non-free components are not included, but if you install them by hand and set some env var, it works, the instructions there should help: https://github.com/genouest/galaxy-tools/blob/master/tools/braker3/README.rst

Do you think the new license of GeneMark-ETP allows for inclusion in conda? License-Creative-Commons-Attribution-NonCommercial-ShareAlike-4.0-International.txt

That was my concern about this choice of license, the noncommercial makes it non free. Which means redistributing it as a conda package will be difficult (impossible, or a lot of legal discussion).

Maybe if the author wrote an explicit paragraph in the license giving permission to package it as a conda package, it would make life easier. But I don't know how open Mark is open on this subject.

Ping @bgruening in case you have anything to add/correct :)

We tried to raise this topic, but I guess we failed.

CC is imho not very useful for software. CC actually recommends using other licenses for software here: https://creativecommons.org/about/software/
https://creativecommons.org/faq/#can-i-apply-a-creative-commons-license-to-software

We recommend against using Creative Commons licenses for software. Instead, we strongly encourage you to use one of the very good software licenses that are already available. We recommend considering licenses listed as free by the Free Software Foundation and listed as “open source” by the Open Source Initiative.

The other point is the "non-commercial" aspect. The problem "downstream" is facing with those licenses is that Bioconda, BioContainers, Debian and others (including Galaxy) can not easily and not practically check if a user is commercial or not. Since we could (?) be liable for that, we prefer to not deploy this software and advertise alternatives.