The objective of DRH-OE is to get ingestable content from CSV, Excel, and similar formats into an SQL-queryable analyst-friendly format as quickly as possible. Once content is SQL-queryable and analyst-friendly, it can be anonymized, enriched, cleansed, validated, transformed, and pushed to the Doltlab repository.
To quickly facilitate getting ingestable content into an SQL-queryable analyst-friendly format, DRH-OE employs the following architecture strategy:
- SQL-native encourages performing work inside a DuckDB database as early as possible in the ingestion process but all orchestrated resources are currently pushed to the Diabetes Research Hub (DRH) application which is a self hosted Doltlab. Doltlab can be replaced with other databases such as PostgreSQL etc, in future, if required.
- A TypeScript type-safe Runtime (Deno) is used to drive the DuckDB SQL and uses OS-specific execution of DuckDB Shell (CLI) for parallelization and scalability.
- Automatic upgrades of code using
semver
and GitHub tags.
The following instructions are for setting up the code on a developer sandbox ("dev sandbox" or just "sandbox").
For dev sandboxes you should be able to use workstations or laptops that have:
- Modern i5 or i7 class CPUs (circa 2021 or later)
- 32GB RAM
- 25GB of disk space
- Windows 11 if possible (Windows 10 is a possibility, too)
- Windows Subsystem for Linux (WSL)
Instructions for deploying to test servers or production servers will be provided later.
If you have a relatively modern Windows 10/11 system with winget
and scoop
you can use "Windows Terminal (Administrator)" to install Git, Deno, DuckDB,
SQLite and VS Code IDE:
$ winget install Git.Git deno SQLite.SQLite DuckDB.cli Microsoft.VisualStudioCode
$ scoop install sqlpage
IMPORTANT: winget
installations will update your PATH so exit your
terminal, close VS Code, etc. and restart your Windows Terminal (Administrator)
session and VS Code before you try out the code.
If you want a nice, easier to read, CLI prompt install and setup Oh My Posh.
You can also evaluate the code in a Windows Sandbox environment
For Linux or MacOS use pkgx and eget to install dependencies. For guidance see Strategy Coach Workspaces Host and then:
$ pkgx install sqlite.org duckdb.org
$ eget lovasoa/SQLpage --to=$HOME/.local/bin/sqlpage
See support/docs/dependencies.md if you need further details about what the code depends on at runtime.
Once you've installed Git and Deno you can run the code directly from GitHub (the latest version or any specific pinned version) without cloning the GitHub repo or clone the repo and run the code locally.
The instructions below assume c:\workspaces
as your workspaces root but you
should change that to D:\
or /home/user/workspaces
or whatever your
workspaces root happens to be (based on your operating system).
$ md c:\workspaces # create the destination if required
$ cd c:\workspaces # or wherever your sources are stored
$ deno run -A https://raw.githubusercontent.com/diabetes-research/workspaces/main/ws-bootstrap-typical.ts
# after repo cloning command (above) is complete:
$ cd github.com/diabetes-research/ingestion-center
$ deno task # list available tasks in `deno.jsonc`
$ deno task doctor # see if dependencies are installed properly
The instructions below assume `$HOME/workspaces' (//home//workspaces) as your workspaces root but you can clone the code and run locally if you prefer.
$ cd $HOME/workspaces # move to the workspaces folder or or wherever your sources are stored
# after repo cloning command is complete:
$ cd github.com/diabetes-research/ingestion-center
$ deno task # list available tasks in `deno.jsonc`
$ deno task doctor # see if dependencies are installed properly
If deno task doctor
reports dependencies are installed properly:
$ deno task ingestion-center-drh-test-e2e
Use VS Code to open the github.com/diabetes-research/ingestion-center
folder
and open
support\assurance\ingestion-center-elt\drh\results-test-e2e\resource.sqlite.db
(the SQLite3 Editor
extension, if you accepted VS Code's recommendation, will
open it).
If the above works, and you installed SQLPage, you can start a webserver, too:
$ deno task ingestion-center-drh-test-serve
To create a 3NF data model of DRH CGM:
$ deno task ingestion-center-drh-cgm-3nf-data-model
During build (development) in a sandbox you will need all the runtime dependencies mentioned above plus do the following:
- Download Visual Studio Code IDE and
use it for editing or viewing of CSV and other assets. VS Code is available
for all major OS platforms.
- Install the recommended extensions (see
.vscode/extensions.json
). VS Code will usually suggest those automatically.
- Install the recommended extensions (see
This code allows multiple operating models, but these two are the most likely use cases:
- Clinicians and engineers are now using deidentified continuous glucose monitor (CGM) real-world data to develop physiology models of glycemia for 1. constructing better algorithms for closed loop systems and 2. understanding performance of these systems at different levels of glycemia and during different periods of rate change. Eventually, clinicians will want to use CGM data collected by way of population health to predict, prevent, and treat complications of diabetes. Also NIH is now requiring researchers whom they fund to store collected data into a repository and no such product exists for CGM data
All of the components of this repo should be able to run in a Serverless environment like AWS Lamba. See: