cmprsk - Competing Risks Regression

Regression modeling of sub-distribution functions in competing risks.

A python wrapper around the cmprsk R package.

Description: Estimation, testing and regression modeling of subdistribution functions in competing risks, as described in Gray (1988), A class of K-sample tests for comparing the cumulative incidence of a competing risk, Ann. Stat. 16:1141-1154, and Fine JP and Gray RJ (1999), A proportional hazards model for the subdistribution of a competing risk, JASA, 94:496-509.

Original Package documentation

Requirements

python

Only python 3 is now supported. Recommended python version >= 3.8

The original version of this package was written with rpy2 version 2.9.4. Since then, rpy2 had many breaking changes. Therefore cmprsk version 0.X.Y only works with rpy2 version 2.9.X.
The cmprsk package v 1.X.Y is now up-to-date and is using rpy2 3.4.5.

Installation steps

install R
install cmprsk R package: open R terminal and run install.packages("cmprsk")
create a virtual environment (recommended)
installrpy2 - if using conda for creating the virtual environment on MacOS M1 (apple silicon) install rpy2 using pip (tested on version 3.5.9)
install pandas (tested on version 1.5.3)
install scipy (tested on version 1.10.1)
install pytest and pytest-cov for running unit tests (dev) only

This package is using rpy2 in order to use import the cmprsk R packge and therefore the requirements for rpy2 must be met.

TL;DR

Unix like OS: Linux, MacOS, BSD. (May work on Windows, look at [rpy2 binaries])(https://rpy2.readthedocs.io/en/version_2.8.x/overview.html#microsoft-s-windows-precompiled-binaries).
python >= 3.5
R >= 3.3 how to install R
readline 7.0 - Should be installed as part of rpy2. how to install on MacOS see also the following issue
Thecmprsk R library (open the R consule and run install.packages('cmprsk'))

Quickstart

For example usage consult the tutorial notebook in this repo: package_usage.ipynb

Example: crr

import pandas as pd

import cmprsk.cmprsk as cmprsk

from cmprsk import utils

data = pd.read_csv('my_data_file.csv')
# assuming that x1,x2,x3, x4 are covatiates. 
# x1 are x4 are categorical with baseline 'd' for x1 and 5 for x2 
static_covariates = utils.as_indicators(data[['x1', 'x2', 'x3', 'x4']], ['x1', 'x4'], bases=['d', 5])

crr_result = cmprsk.crr(data['ftime'], data['fstatus'], static_covariates)
report = crr_result.summary

print(report)

ftime and fstatus can be numpy array or pandas series, and static_covariates is a pandas DataFrame. The report is a pandas DataFrame as well.

Example: cuminc

import matplotlib.plt
import numpy as np
import pandas as pd


from cmprsk import cmprsk

data  = pd.read_csv('cmprsk/cmprsk/tests/example_dataset.csv')
cuminc_res = cmprsk.cuminc(data.ss, data.cc, group=data.gg, strata=data.strt)

# print
cuminc_res.print

# plot using matplotlib

_, ax = plt.subplots()
for name, group in cuminc_res.groups.items():
    ax.plot(group.time, group.est, label=name)
    ax.fill_between(group.time, group.low_ci, group.high_ci, alpha=0.4)
    
ax.set_ylim([0, 1])
ax.legend()
ax.set_title('foo bar')
plt.show()

Development

For running the unit tests run

pytest --cov=cmprsk cmprsk/tests/

from the project root. Note: you'll need to install pytest-cov.

Current coverage

---------- coverage: platform darwin, python 3.9.7-final-0 -----------
Name                             Stmts   Miss  Cover
----------------------------------------------------
cmprsk/__init__.py                   0      0   100%
cmprsk/cmprsk.py                   128     22    83%
cmprsk/rpy_utils.py                 44     10    77%
cmprsk/tests/__init__.py             0      0   100%
cmprsk/tests/test_cmprsk.py         30      0   100%
cmprsk/tests/test_rpy_utils.py      27      1    96%
cmprsk/tests/test_utils.py          37      0   100%
cmprsk/utils.py                     23      1    96%
----------------------------------------------------
TOTAL                              289     34    88%

How to update package:

update version in setup.py
rm -fr dist directory
python setup.py sdist bdist_wheel
twine upload dist/* --verbose

Vincent-Maladiere/cmprsk