healthcareai

The aim of healthcareai is to streamline machine learning in healthcare. The package has two main goals:

Allow one to easily create models based on tabular data, and deploy a best model that pushes predictions to SQL Server.
Provide tools related to data cleaning, manipulation, and imputation.

Installation

Windows

If you haven't, install 64-bit Python 3.5 via the Anaconda distribution
- Important When prompted for the Installation Type, select Just Me (recommended). This makes permissions later in the process much simpler.
Open the terminal (i.e., CMD or PowerShell, if using Windows)
Run conda install pyodbc
Upgrade to latest scipy (note that upgrade command took forever)
Run conda remove scipy
Run conda install scipy
Run conda install scikit-learn
Install healthcareai using one and only one of these three methods (ordered from easiest to hardest).
1. Recommended: Install the latest release with conda by running conda install -c catalyst healthcareai
2. Install the latest release with pip run pip install healthcareai
3. If you know what you're doing, and instead want the bleeding-edge version direct from our github repo, run pip install https://github.com/HealthCatalystSLC/healthcareai-py/zipball/master

Why Anaconda?

We recommend using the Anaconda python distribution when working on Windows. There are a number of reasons:

When running anaconda and installing packages using the conda command, you don't need to worry about dependency hell, particularly because packages aren't compiled on your machine; conda installs pre-compiled binaries.
A great example of the pain the using conda saves you is with the python package scipy, which, by their own admission "is difficult".

Linux

You may need to install the following dependencies:

sudo apt-get install python-tk
sudo pip install pyodbc
- Note you'll might run into trouble with the pyodbc dependency. You may first need to run sudo apt-get install unixodbc-dev then retry sudo pip install pyodbc. Credit stackoverflow

Once you have the dependencies satisfied run pip install healthcareai or sudo pip install healthcareai

macOS

pip install healthcareai or sudo pip install healthcareai

Linux and macOS (via docker)

Install docker
Clone this repo (look for the green button on the repo main page)
cd into the cloned directory
run docker build -t healthcareai .
run the docker instance with docker run -p 8888:8888 healthcareai
You should then have a jupyter notebook available on http://localhost:8888.

Verify Installation

To verify that healthcareai installed correctly, open a terminal and run python. This opens an interactive python console (also known as a REPL). Then enter this command: from healthcareai import develop_supervised_model and hit enter. If no error is thrown, you are ready to rock.

If you did get an error, or run into other installation issues, please let us know or better yet post on Stack Overflow (with the healthcare-ai tag) so we can help others along this process.

Getting started

Visit healthcare.ai to read the docs and find examples.
- Including this notebook
Open Sphinx (which installed with Anaconda) and copy the examples into a new file
Modify the queries and parameters to match your data

If you plan on deploying a model (ie, pushing predictions to SQL Server), run this in SSMS beforehand:

CREATE TABLE [SAM].[dbo].[HCPyDeployClassificationBASE] (
   [BindingID] [int] ,
   [BindingNM] [varchar] (255),
   [LastLoadDTS] [datetime2] (7),
   [PatientEncounterID] [decimal] (38, 0), --< change to your grain col
   [PredictedProbNBR] [decimal] (38, 2),
   [Factor1TXT] [varchar] (255),
   [Factor2TXT] [varchar] (255),
   [Factor3TXT] [varchar] (255))

CREATE TABLE [SAM].[dbo].[HCPyDeployRegressionBASE] (
   [BindingID] [int],
   [BindingNM] [varchar] (255),
   [LastLoadDTS] [datetime2] (7),
   [PatientEncounterID] [decimal] (38, 0), --< change to your grain col
   [PredictedValueNBR] [decimal] (38, 2),
   [Factor1TXT] [varchar] (255),
   [Factor2TXT] [varchar] (255),
   [Factor3TXT] [varchar] (255))

Note that we're currently working on easy connections to other types of databases.

For Issues

Double check that the code follows the examples here
If you're still seeing an error, create a post in our Google Group that contains
- Details on your environment (OS, database type, R vs Py)
- Goals (ie, what are you trying to accomplish)
- Crystal clear steps for reproducing the error

Contributing

You want to help? Woohoo! We welcome that and are willing to help newbies get started.

Please see our contribution guidelines for instructions on setting up your development environment

Workflow

Identify an issue that suits your skill level
- Only look for issues in the Backlog category
- If you're new to open source, please look for issues with the bug low, help wanted, or docs tags
- Please reach out with questions on details and where to start
Create a topic branch to work in; here are instructions
Create a throwaway file on the Desktop (or somewhere outside the repo), based on an example
Make changes and use the throwaway file to validate that your packages changes work
- Make small commits after getting a small piece working
- Push often so your changes are backed up. See this for more details.
Early on, create a pull request such that Levi and team can discuss the changes that you're making. Conversation is good.
When you have resolved the issue you chose, do the following:
- Check that the unit tests are passing
- Check that pyflakes and pylint don't show any issues
- Merge the master branch into your topic branch (so that you have the latest changes from master)
```
git checkout LeviBugFix
git fetch
git merge --no-ff origin/master
```
- Again, check that the unit tests are passing
Now that your changes are working, communicate that to Levi in the pull request, such that he knows to do the code review associated with the PR. Please don't do tons of work and then start a PR. Early is good.

PyPI Package Creation and Updating

Note these instructions are for maintainers only.

First, read this Packaging and Distributing Projects guide.

It's also worth noting that while this should be done on the pypi test site, I've run into a great deal of trouble with conflicting guides authenticating to the test site. So be smart about this.

Build a source distribution: from python3 (ran in windows anaconda python 3) run python setup.py sdist
Register the package by using the form on pypi. Upload your PKG-INFO that was generated inside the .egg file.
Upload the package using twine
- twine upload dist/healthcareai-<version>.tar.gz
- NOTE You can only ever upload a file name once. To get around this I was adding a rc number to the version in setup.py. However, this will break the appveyor build, so you'll need to remove the .rc before you push to github.
Verify install on all three platforms (linux, macOS, windows) by first pip uninstall healthcareai and then pip install healthcareai, followed by a from healthcareai import develop_supervised_model in a python REPL.

Release process (Including Read The Docs)

update all version numbers
- setup.py
update CHANGELOG
- Move all items under unreleased to a new release number
- Leave the template under unreleased
merge in the PR
create release on github releases (making sure this matches the release number in setup.py)
Create and upload the new pypi release (see above)
update readthedocs settings
- Admin > Versions
- Ensure that the new release number is checked for public
Manually build new read the docs
- Builds > Build version
verify the new version builds and is viewable at the public url

Conda Packaging and Distribution

Creating a conda package is much easier if you have already built the PyPI package.

Install prerequisites (only needed once)
- Install conda build conda install conda-build
- Install anaconda cli conda install anaconda-client
- Login to anaconda.org with anaconda login
Configure conda
- conda config --set always_yes true
- conda config --set anaconda_upload no
Create the skeleton conda recipe from the existing PyPI package
- conda skeleton pypi healthcareai
Build the conda package for the main python versions
- conda build --python 2.7 healthcareai
- conda build --python 3.4 healthcareai
- conda build --python 3.5 healthcareai
- conda build --python 3.6 healthcareai
Convert the existing builds to work on all platforms (win32, win64, osx62, linux32, linux64). Note this can take a while.
- conda convert --platform all win-64/healthcareai-*-py*.tar.bz2 -o <PATH_TO_BUILD_DIRECTORY>
Upload to anaconda using the anaconda cli
- Note that you'll have to keep track of where the builds are put!
- anaconda upload <PATH_TO_BUILD_DIRECTORY>/**/healthcareai*.tar.bz2
Clean up the mess
- conda build purge

Helpful Resources

Conda Building Packages
Anaconda.org dashboard
Taken from the excellent conda.io docs
Also, some taken from this Travis CI build#

yoshimaa/healthcareai-py