/mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC-III database

Primary LanguageJupyter NotebookMIT LicenseMIT

MIMIC Code Repository Build Status DOI Join the chat at https://gitter.im/MIT-LCP/mimic-code

This is a repository of code shared by the research community. The repository is intended to be a central hub for sharing, refining, and reusing code used for analysis of the MIMIC critical care database. To find out more about MIMIC, please see: https://mimic.physionet.org. Source code for the website is in the mimic-website GitHub repository.

You can read more about the code repository in the following open access paper: The MIMIC Code Repository: enabling reproducibility in critical care research.

Brief introduction

The repository consists of a number of Structured Query Language (SQL) scripts which build the MIMIC-III database in a number of systems and extract useful concepts from the raw data. Jupyter notebooks are also provided which detail analyses performed on MIMIC-III.

The repository is organized as follows:

  • benchmark - Various speed tests for indices
  • buildmimic* - Scripts to build MIMIC-III in a relational database management system (RDMS), in particular postgres is our RDMS of choice
  • concepts - Useful views/summaries of the data in MIMIC-III, e.g. demographics, organ failure scores, severity of illness scores, durations of treatment, easier to analyze views, etc. The paper above describes these in detail, and a README in the subfolder lists concepts generated.
  • notebooks - A collection of R markdown and Jupyter notebooks which give examples of how to extract and analyze data
  • notebooks/aline - An entire study reproduced in the MIMIC-III database - from cohort generation to hypothesis testing
  • notebooks/aline-aws - As above, launchable immediately on AWS
  • tests - You should always have tests!
  • tutorials - Similar to the notebooks folder, but focuses on explaining concepts to new users

* A Makefile build system has been created to facilitate the building of the MIMIC database, and optionally contributed views from the community. Please refer to the Makefile guide for more details.

Cloud access

The MIMIC-III database is now available on two major cloud platforms: Google Cloud Platform (GCP) and Amazon Web Services (AWS). To access the data on the cloud, simply add the relevant cloud identifier to your PhysioNet profile. Further instructions are available on the MIMIC-III website.

Derived concepts can be immediately accessed by querying them directly on BigQuery under the mimiciii_derived dataset in the physionet-data project (see cloud instructions for accessing MIMIC-III on the cloud).

Launch MIMIC-III in AWS

Use the below Launch Stack button to deploy access to the MIMIC-III dataset into your AWS account. This will give you real-time access to the MIMIC-III data in your AWS account without having to download a copy of the MIMIC-III dataset. It will also deploy a Jupyter Notebook with access to the content of this GitHub repository in your AWS account. Prior to launching this, please login to the MIMIC PhysioNet website, input your AWS account number, and request access to the MIMIC-III Clinical Database on AWS.

To start this deployment, click the Launch Stack button. On the first screen, the template link has already been specified, so just click next. On the second screen, provide a Stack name (letters and numbers) and click next, on the third screen, just click next. On the forth screen, at the bottom, there is a box that says I acknowledge that AWS CloudFormation might create IAM resources.. Check that box, and then click Create. Once the Stack has complete deploying, look at the Outputs tab of the AWS CloudFormation console for links to your Juypter Notebooks instance.

cloudformation-launch-stack

Other useful tools

  • Bloatectomy (paper) - A python based package for removing duplicate text in clinical notes
  • Medication categories - Python script for extracting medications from free-text notes
  • MIMIC Extract (paper) - A python based package for transforming MIMIC-III data into a machine learning friendly format
  • FIDDLE (paper (PDF)) - A python based package for a FlexIble Data-Driven pipeLinE (FIDDLE), transforming structured EHR data into a machine learning friendly format

Acknowledgement

If you use code or concepts available in this repository, we would be grateful if you would cite the above paper as follows:

Johnson, Alistair EW, David J. Stone, Leo A. Celi, and Tom J. Pollard. "The MIMIC Code Repository: enabling reproducibility in critical care research." Journal of the American Medical Informatics Association (2017): ocx084.

If including a hyperlink to the code, we recommend you use the DOI from Zenodo rather than a GitHub URL: https://doi.org/10.5281/zenodo.821872

Contributing

Our team has worked hard to create and share the MIMIC dataset. We encourage you to share the code that you use for data processing and analysis. Sharing code helps to make studies reproducible and promotes collaborative research. To contribute, please:

We encourage users to share concepts they have extracted by writing code which generates a materialized view. These materialized views can then be used by researchers around the world to speed up data extraction. For example, ventilation durations can be acquired by creating the ventdurations view in concepts/durations/ventilation_durations.sql.

License

By committing your code to the MIMIC Code Repository you agree to release the code under the MIT License attached to the repository.

Coding style

Please refer to the style guide for guidelines on formatting your code for the repository.