/cohort-extractor

Cohort extractor tool which can generate dummy data, or real data against OpenSAFELY-compliant research databases

Primary LanguagePythonOtherNOASSERTION

cohort-extractor is in maintenance mode

Active development is focused on ehrQL, the electronic health record query language that is cohort-extractor's successor.

Refer to the ehrQL documentation to learn more.

OpenSAFELY cohort extractor tool

This tool supports the authoring of OpenSAFELY-compliant research, by:

  • Allowing developers to generate random data based on their study expectations. They can then use this as input data when developing analytic models.
  • Supporting downloading of codelist CSVs from the OpenSAFELY codelists repository, for incorporation into the study definition
  • Providing tools to understand and visualise the properties of real data, without having direct access to it

It is also the mechanism by which cohorts are extracted from live database backends within the OpenSAFELY framework.

It is designed to be run within an OpenSAFELY-compliant research repository, via Docker. You can find a template repository here and a Getting Started guide in the OpenSAFELY documentation to help you get your study repository set up.

Normally it will be invoked via the OpenSAFELY command line tool, as described in the documentation.

If running it directly, it should be run from within the research repository. To run the latest version via Docker and access its full help:

docker run --rm ghcr.io/opensafely-core/cohortextractor --help

For developers

Please see the additional information.

About the OpenSAFELY framework

The OpenSAFELY framework is a Trusted Research Environment (TRE) for electronic health records research in the NHS, with a focus on public accountability and research quality.

Read more at OpenSAFELY.org.