hackseq/hackseq_projects_2016

Project 1: Implement an evaluation framework for software that manipulates HGVS-formatted variants

Closed this issue ยท 24 comments

Project:

Background

HGVS is a syntax and set of recommendations used to represent sequence variants. The guidelines are broad and complex, and packages that implement these guidelines may not implement all features or may miss critical cases. This leads to uncertainty among users about which packages to use for particular purposes. The goal of the hgvs-eval project is to provide an objective framework by which HGVS tools may be assessed. It is envisioned that the primary user-visible product will be a web page that facilitates comparing packages (and package versions) on evaluation suite results.

Major tasks

Specific tasks for this project are:

  • Define REST interface for manipulating HGVS variants.
  • Define and implement an initial set of tests by which packages will be assessed.
  • Implement tests. Implementation will enable both language-agnostic unit tests as well as REST-based evaluation.
  • Implement REST interface for one package (https://bitbucket.org/biocommons/hgvs/).
  • Implement a test database and rudimentary web interface that summarizes results.
  • Documentation REST interface using swagger or RAML, sufficient to enable package owners to implement a REST interface using their tools.

Note: A REST interface, tests, and design considerations are provided in this draft proposal, which will be refined prior to the HackSeq event.

Ideal candidates

The implementation will be in Python, so all contributors should have good Python skills.

In addition, applicants should have experience with at least a few of the following: Python unit testing, REST interface design, HGVS variant nomenclature, simple web interfaces.

Project Lead: Reece Hart / @reece / Industry Professional / Invitae

We're planning to have a Docker image with a bunch of bioinformatics software preinstalled running on machines at the BC Cancer Agency Genome Sciences Centre during the Hackathon. Which bioinformatics software do you plant to use for your project? In particular, is there any software that you plan to use that is not already listed here? http://www.bcgsc.ca/services/orca

reece commented

gcc/g++, git, mercurial, mercurial-git, pyvenv, and virtualenvwrapper would be great.

Thanks for asking!

reece commented

I am open to paring down or adding tasks in order to adjust for team size, as well as to adjusting the scope to meeting the needs and interests of participants.

For possible related tasks, browse HGVS issues and UTA issues.

Feel free to ask questions or comment here.

This project is now a 1-day GA4GH event.
@reece Is there a web site for the GA4GH hackathon event?

Isn't the GA4GH October event overlapping with the Hackathon? Was interested in this project so further details are appreciated.

Thanks,
Phil

reece commented

@Phillip-a-richmond - First, I'm glad you're interested.

GA4GH joined the Hackseq hackathon plans late with the intention of having a GA4GH track in Hackseq. I think that's supposed to still be happening, but I don't know details. I'll find them out and reply here.

@sjackman -- I don't know what organization GA4GH is doing. I'll try to clarify.

-Reece

reece commented

I've not been a part of the GA4GH-Hackseq coordination discussions. However, it appears to me that there's been a misunderstanding.

My intent is to hack on this project on the 17th at the hackathon. I'd love to have partners in crime. Please reply on this issue if you're interested. Also, I've signed up to give an hgvs workshop which might get people aligned about goals/needs.

An hgvs workshop sounds great, and I know a few people who would likely be interested. When/where would that be? During ASHG or hackseq?

reece commented

The hackseq folks are arranging workshops (see hackseq/October_2016#23). I think the plan is that they're interleaved with hacking, but I'm not sure.

They're still getting workshop titles. Here's what I submitted:

Introduction to HGVS Nomenclature and the Python hgvs package
The HGVS sequence variant nomenclature is a set of recommendations for presenting biological sequence variants to humans. Unfortunately, humans arenโ€™t very good at distinguishing formats that are convenient for them from representations that are convenient for computers. As a result, humans put HGVS variant strings into databases and web pages and clinical reports, making it difficult to compute on these variants. The Python hgvs package (Apache 2.0 licenced) parses, formats, validates, and shifts/normalizes variants, and projects (maps) variants between aligned sequences. An important distinguishing feature of the hgvs package is that it correctly handles cases in which the genomic and transcript sequences differ by substitutions or indels.

This workshop will introduce the hgvs package and help attendees get started with using it. To get the most out of the workshop, attendees should bring a laptop with Python installed or the ability to run docker containers. Additional preparation instructions will be provided at a later date.

However, it appears to me that there's been a misunderstanding. My intent is to hack on this project on the 17th at the hackathon. I'd love to have partners in crime.

Sorry for the confusion, Reece! These participants have GitHub accounts (there are others for whom I have only e-mail addresses) and expressed interest in HGVS.
@cchng @dandanxu @marciam @Madelinehazel @amanjeev @dfornika @hamzakhanvit @ronaldhause

Yes, the plan is to interleave workshops/talks with the hacking. We're sorting out the details and schedule.

reece commented

Hi- @cchng @dandanxu @marciam @Madelinehazel @amanjeev @dfornika @hamzakhanvit @ronaldhause.

How many of you are still interested and able to work on hgvs-eval? I'm available on the 17th only, unfortunately. Please reply here. We can scope the goals and planning for the project to fit our collective interests.

Hi, I would have liked to work on HGVS, but I'm already on project #9 and will likely have limited availability on Oct.17. Thanks, Marcia

Hey Reece! I'd be interested in working on this, and the 17th would work for me.

Hi @reece,

I'm already on Project #1 and would have limited availability on the 17th of Oct. Thanks.

Best,
Hamza

I am on project #3 already so I might not have time. Thanks for your reminder.

cchng commented

I've accepted the invitation to join team 4.
But as far as I know, we haven't done anything.
I would be happy to switch to the HGVS project if it's open.

Thanks,
Carolyn

@cchng Hi, Carolyn. If you like, and if it's okay with the project leader of team 4, you could possibly participate in both projects: project 4 on Oct 15โ€“16, and this project on Oct 17.

cchng commented

@sjackman Hi Shaun, that sounds good, if the team leads are okay with it!

@lucapinello Hi, Luca. Project 1 (HGVS) is a one-day hackathon on Oct 17. Is it okay with you if Carolyn (@cchng) works with you on your project on Oct 15โ€“16 and Project 1 on Oct 17?

Hi Shaun,
this is absolutely fine with me!

The priority #1 is to have fun.

Best,
Luca

reece commented

Hi folks-

I just wanted to let you know that the hgvs-eval project is still on. I'm looking forward to it!

There will be many kinds of tasks for this project. Here are a few I can think of (and we may not have all represented, which is okay):

  • test (feature evaluation) design: What features are we testing and how?
  • db: design and implement schema (sqlite likely)
  • REST interface: design interface urls, responses, and status
  • Web UI: tabular layout and selection
  • start implementation of at least 1 interface: mutalyzer (via soap interface), hgvs, or pyhgvs

I have some ideas (and some docs) on all of these, and some early docker images that I will try to have available by Thu (earlier, I hope). If you anticipate coding, you may want to install docker (https://docs.docker.com/engine/installation/) in the meantime.

reece commented

The GA4GH and Hackseq organizers have decided that this project should move physically and logistically to the GA4GH hackathon on Oct 17 at the Marriott Hotel, closer to the conference venue. More information is at http://www.ga4gh.org/#/hackathon2016. The discussion on the project will also move to ga4gh/hackathon2016#1.

If you were planning to join this project, I hope that you are still able to do so! Please indicate your interest in the project at ga4gh/hackathon2016#1.

I will still be giving an HGVS Workshop on Saturday at 11am. Details and laptop prep instructions are at https://github.com/hackseq/October_2016/blob/master/workshop_details.md#saturday.

Thanks to the Hackseq and GA4GH organizers for making all of this possible!

@sjackman, @santina: Would you please close this issue?

I look forward to meeting you on Saturday, and best of luck at your new venue, Reece!