/monster

Hub for the Monster team in DSP Data Engineering

Primary LanguageShell

The Monster Team

Monster Slack Monster CI Slack

New to the team? Start here.

People

Name Role GitHub
Dan Moran Tech Lead @danxmoran
Emily Munro-Ludders Scrum Master @emunrolu
Jeff Korte Product Owner @JeffKorte
Kathy Reinold Data Modeler @kreinold
Raaid Arshad Software Engineer @raaidbroad

GitHub Teams

  • DSP Monsters - Team for repositories under the broadinstitute org
  • Emerald Writers - Team for repositories under the DataBiosphere org

Projects

Data Modeling

Linked Data definitions for the DSP Core Data Model, with extensions for unmodeled datasets.

Documentation

GitHub repos

Data Ingest

Pipelines for moving data into the Jade Data Repository.

Documentation

GitHub repos

  • ClinVar - ETL pipeline for the ClinVar dataset
  • ENCODE - ETL pipeline for the ENCODE dataset
  • Dog Aging - ETL pipeline for the Dog Aging Project dataset

Operations

Infrastructure, configuration, and shared code used to manage developing and deploying our services.

GitHub repos

  • sbt plugins - Common build plugins used across Monster projects
  • Helm charts - Custom Helm charts for pieces of Monster infrastructure
  • Core infrastructure - Terraform modules and Helm release definitions for Monster's GCP environments

Semi-Archived

The repositories in this section are still being used, but we're trying to move away from them.

Data Ingest Framework

Our first stabs at data ingest envisioned a framework of dataset-agnostic services. We shifted away from that pattern because it introduced significant overhead vs. custom pipelines using common command-line tools.

GitHub repos

  • Transporter - Bulk file-transfer system
  • Monster ETL - Apache Beam workflows for ingest
  • Extractors - Tools / services for mechanically transforming external metadata into Beam-friendly JSON
  • Ingest Deploy - Terraform and Kubernetes configuration for deploying ingest components into GCP, based on the now-abandoned dsp-k8s-deploy
  • Storage Libs - Utility libraries for I/O against external storage systems