Caleydo/tdp_publicdb

Integrate alembic and add initial database schema

Closed this issue · 2 comments

Coming from PR #63 we should add Alembic for SQLAchemy. The first alembic revision should create the initial Ordino database schema without data.

Tasks

  • Connect to Ordino public DB
  • Export schema pg_dump -U postgres -s postgres > exportFile.dmp (see https://www.postgresql.org/docs/10/app-pgdump.html and https://stackoverflow.com/a/31602990)
  • Initialize alembic
  • Write initial alembic revision that creates the schema in the upgrade method
  • Always add IF NOT EXISTS to prevent errors due to duplicates
  • Write downgrade method that drops tables and views
  • Comment the content of downgrade so that we do not delete data by accident (but if somebody wants to use it, the method is ready to use)
  • Test it

Definition of Done

  • Creating a new Ordino instance and running alembic head should create the schema in the local docker publicdb container
  • Ordino should start (with no data)

Current status :

  • Alembic has been integrated using the extension point tdp-sql-database-migration.
  • We have extracted the schema from the old ordino postgres10-alpine database and the new postgres12.1 ordino database with the new data.
  • We are in the process of extracting a diff from the old and the new schema in order to update the schema on the old ordino database without deleting any data.
  • Currently using a tool called pgCodeKeeper to extract the diff and compose a migration script.
  • In the process of testing said script to make sure it just alters the schema without deleting any data on the target db.

schema

We discovered that a diff/upgrade from hg19 to hg38 would be too complex. That's why we will add only the missing table/columns for the hg19 to run with the newest SQL queries in Python. For now only the integration of HLA Type, MSI is considered (see Caleydo/tdp_bi_bioinfodb#63).

Add a alembic revision with the follwing content:

  • upgrade
    1. create tables microsatelliteinstabilityview, mutationalburden
    2. alter/replace the tdp_cellline view
  • downgrade
    1. alter/replace the with the old tdp_cellline view
    2. remove tables microsatelliteinstabilityview, mutationalburden

Important: We only need to run this revision for hg19 and we must not run it on hg38! Manually create a table public._h19_marker only for hg19 and check before upgrade and downgrade if this table exists.

Note that the table hla_a_type (from h38) is already available as view in hg19, so no table needs to be created during the migration.