andrewhill157/leiden

Loosen Broad Cluster Dependencies

andrewhill157 opened this issue · 1 comments

Annotation, and therefore downstream steps, are currently tightly coupled to the Broad Institute's distributed computing cluster. This makes it difficult to deploy to a broader audience. Furthermore, the current implementation is dependent on Monkol's custom version of VEP.

  • Make implementation based on standard VEP annotation.
  • Could simply require that a VEP script is on the user's PATH
  • Makes integration of plugins possible (for frameshift variants and such)

This allows people to either use your standard annotation or do their own custom VEP annotation. Means that downstream validation will be compatible regardless, which would be great.

For resources that are specific to our lab, such as 26K, HGMD, DBSNP, etc, can either separate this aspect entirely (validate and then only annotate the validated variants, for example). This might actually make a lot of sense... If want to keep these as part of the library, can add a check for the path truly existing or not for the 26K data, etc. Workflow would be do the standard validation protocol and then people can do whatever they want with the validated variants in terms of additional annotation.

Standard VEP is now used for remapping and annotation. This not only eliminates dependency on cluster, but also makes validation dramatically easier. VEP 75 has a HGVSP annotation which is the protein description in HGVS... After converting the LOVD and VEP HGVS protein description to same format (very easy), can literally compare via string equality... no fancy parsing needed. Just have to find a way to deal with intronic variants, splice variants, etc.

Not using Counsyl's remapping library will also dramatically simplify installation. All easy installs for dependencies, just need to have VEP on path.