CRG-CNAG/CalliNGS-NF

Implement support for GATK4

pditommaso opened this issue ยท 17 comments

Callings should be upgraded to support GTAK4. Unfortunately the new GATK version is not command line compatible with the previous version.

Using GATK4 the process 3_rnaseq_gatk_splitNcigar returns the following error:

 org.broadinstitute.hellbender.exceptions.UserException: '-T' is not a valid command.

To replicate the error run the pipeline with the gatk4 profile, eg:

nextflow run CRG-CNAG/CalliNGS-NF -profile gatk4

@anvlasova Have you had any chance to give a look a this? It would be nice to being able to support GATK4 at some point.

I would like to see this update as well. I would be willing to remote pair program with someone from the Nextflow team to get this working. Ping me to schedule an hour to work on this.

@lynnlangit thanks for your support. I can assist you in this update.

Great @pditommaso - would you prefer to use the GATK 4.x .jar file (https://software.broadinstitute.org/gatk/download/) or a docker container which includes the GATK 4.x jar file (https://hub.docker.com/r/broadinstitute/gatk/)?

Also the license info in your README could be updated as GATK 4 is open source under a BSD 3-clause "New" or "Revised" license (https://software.broadinstitute.org/gatk/download/licensing)

The GATK jar should be included in the container used by this pipeline because there are jobs requiring more than one tool in the same script, eg here. Therefore the gatk only container would not work. This has not been done in the past precisely for the previous license limitation.

As for the license, the new GATK license does not impact the project license (MPL). We may need to add a NOTICE file mentioning third-party tool licenses.

To be clear - do you need to
A) reference the GATK 4 jar file directly to work in your library?
-or-
B) reference a container which includes ONLY the GATK 4 tools? Your documentation here seems to prefer the latter case.

=> A ie. GATK installed in the docker container along with the other tools.

@pditommaso - thanks for the clarification. Would it be possible for you to review my PR (documentation update only) and possibly merge it before I start working on the code update for GATK 4?

If you'd prefer to remote pair program on this I'll be online Friday PST time zone.

Hi, sure I can review the docs. Unfortunately, I'm quite packed until Wednesday next week, but I can setup a Gitter chat to discuss the code update if it works for you.

@pditommaso sounds good - just let me know then.

I've created this gitter channel.

I am headed to GCP:Next in SFO today. Is anyone from the Nextflow team attending? If so, we could grab an hour to pair program on this in person.

Thanks to @lucacozzuto we have an implementation using gatk4. I'll merge on master once we are confident tests are OK.

Any update on when this update might be merged? I writing some documentation for The Broad and I'd like to include a link to this Repo w/GATK 4 example when it's ready -- thanks!

I need to check some related projects, hopefully in a couple of weeks. However, you can use it right just specifying the gatk4 branch on the nextflow command line:

nextflow run CRG-CNAG/CalliNGS-NF -r gatk4 -with-docker

Anything I can do to help with this, so that it can be merged?

Sorry for the very late reply. Gatk4 version has been finally merged.