/cmprsk

Regression modeling of sub-distribution functions in competing risks

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

cmprsk - Competing Risks Regression

Regression modeling of sub-distribution functions in competing risks.

A python wrapper around the cmprsk R package.

Description: Estimation, testing and regression modeling of subdistribution functions in competing risks, as described in Gray (1988), A class of K-sample tests for comparing the cumulative incidence of a competing risk, Ann. Stat. 16:1141-1154, and Fine JP and Gray RJ (1999), A proportional hazards model for the subdistribution of a competing risk, JASA, 94:496-509.

Original Package documentation

Requirements

python

  • Only python 3 is now supported. Recommended python version >= 3.8

  • The original version of this package was written with rpy2 version 2.9.4. Since then, rpy2 had many breaking changes. Therefore cmprsk version 0.X.Y only works with rpy2 version 2.9.X.
  • The cmprsk package v 1.X.Y is now up-to-date and is using rpy2 3.4.5.

Installation steps

  • install R
  • install cmprsk R package: open R terminal and run install.packages("cmprsk")
  • create a virtual environment (recommended)
  • installrpy2 - if using conda for creating the virtual environment on MacOS M1 (apple silicon) install rpy2 using pip (tested on version 3.5.9)
  • install pandas (tested on version 1.5.3)
  • install scipy (tested on version 1.10.1)
  • install pytest and pytest-cov for running unit tests (dev) only

This package is using rpy2 in order to use import the cmprsk R packge and therefore the requirements for rpy2 must be met.

TL;DR

Quickstart

For example usage consult the tutorial notebook in this repo: package_usage.ipynb

Example: crr

import pandas as pd

import cmprsk.cmprsk as cmprsk

from cmprsk import utils

data = pd.read_csv('my_data_file.csv')
# assuming that x1,x2,x3, x4 are covatiates. 
# x1 are x4 are categorical with baseline 'd' for x1 and 5 for x2 
static_covariates = utils.as_indicators(data[['x1', 'x2', 'x3', 'x4']], ['x1', 'x4'], bases=['d', 5])

crr_result = cmprsk.crr(data['ftime'], data['fstatus'], static_covariates)
report = crr_result.summary

print(report)

ftime and fstatus can be numpy array or pandas series, and static_covariates is a pandas DataFrame. The report is a pandas DataFrame as well.

Example: cuminc

import matplotlib.plt
import numpy as np
import pandas as pd


from cmprsk import cmprsk

data  = pd.read_csv('cmprsk/cmprsk/tests/example_dataset.csv')
cuminc_res = cmprsk.cuminc(data.ss, data.cc, group=data.gg, strata=data.strt)

# print
cuminc_res.print

# plot using matplotlib

_, ax = plt.subplots()
for name, group in cuminc_res.groups.items():
    ax.plot(group.time, group.est, label=name)
    ax.fill_between(group.time, group.low_ci, group.high_ci, alpha=0.4)
    
ax.set_ylim([0, 1])
ax.legend()
ax.set_title('foo bar')
plt.show()

Development

For running the unit tests run

pytest --cov=cmprsk cmprsk/tests/

from the project root. Note: you'll need to install pytest-cov.

Current coverage

---------- coverage: platform darwin, python 3.9.7-final-0 -----------
Name                             Stmts   Miss  Cover
----------------------------------------------------
cmprsk/__init__.py                   0      0   100%
cmprsk/cmprsk.py                   128     22    83%
cmprsk/rpy_utils.py                 44     10    77%
cmprsk/tests/__init__.py             0      0   100%
cmprsk/tests/test_cmprsk.py         30      0   100%
cmprsk/tests/test_rpy_utils.py      27      1    96%
cmprsk/tests/test_utils.py          37      0   100%
cmprsk/utils.py                     23      1    96%
----------------------------------------------------
TOTAL                              289     34    88%

How to update package:

  1. update version in setup.py
  2. rm -fr dist directory
  3. python setup.py sdist bdist_wheel
  4. twine upload dist/* --verbose