Regression modeling of sub-distribution functions in competing risks.
A python wrapper around the cmprsk R package.
Description: Estimation, testing and regression modeling of subdistribution functions in competing risks, as described in Gray (1988), A class of K-sample tests for comparing the cumulative incidence of a competing risk, Ann. Stat. 16:1141-1154, and Fine JP and Gray RJ (1999), A proportional hazards model for the subdistribution of a competing risk, JASA, 94:496-509.
Original Package documentation
- Only python 3 is now supported. Recommended python version >= 3.8
- The original version of this package was written with
rpy2
version 2.9.4. Since then,rpy2
had many breaking changes. Thereforecmprsk
version 0.X.Y only works withrpy2
version 2.9.X. - The
cmprsk
package v 1.X.Y is now up-to-date and is usingrpy2
3.4.5.
- install
R
- install
cmprsk
R package: open R terminal and runinstall.packages("cmprsk")
- create a virtual environment (recommended)
- install
rpy2
- if usingconda
for creating the virtual environment on MacOS M1 (apple silicon) installrpy2
using pip (tested on version 3.5.9) - install
pandas
(tested on version 1.5.3) - install
scipy
(tested on version 1.10.1) - install
pytest
andpytest-cov
for running unit tests (dev) only
This package is using rpy2
in order to use import the cmprsk R packge and therefore the requirements for rpy2 must be met.
TL;DR
- Unix like OS: Linux, MacOS, BSD. (May work on Windows, look at [rpy2 binaries])(https://rpy2.readthedocs.io/en/version_2.8.x/overview.html#microsoft-s-windows-precompiled-binaries).
- python >= 3.5
- R >= 3.3 how to install R
- readline 7.0 - Should be installed as part of
rpy2
. how to install on MacOS see also the following issue - The
cmprsk
R library (open the R consule and runinstall.packages('cmprsk')
)
For example usage consult the tutorial notebook in this repo: package_usage.ipynb
import pandas as pd
import cmprsk.cmprsk as cmprsk
from cmprsk import utils
data = pd.read_csv('my_data_file.csv')
# assuming that x1,x2,x3, x4 are covatiates.
# x1 are x4 are categorical with baseline 'd' for x1 and 5 for x2
static_covariates = utils.as_indicators(data[['x1', 'x2', 'x3', 'x4']], ['x1', 'x4'], bases=['d', 5])
crr_result = cmprsk.crr(data['ftime'], data['fstatus'], static_covariates)
report = crr_result.summary
print(report)
ftime
and fstatus
can be numpy array or pandas series, and static_covariates
is a pandas DataFrame.
The report
is a pandas DataFrame
as well.
import matplotlib.plt
import numpy as np
import pandas as pd
from cmprsk import cmprsk
data = pd.read_csv('cmprsk/cmprsk/tests/example_dataset.csv')
cuminc_res = cmprsk.cuminc(data.ss, data.cc, group=data.gg, strata=data.strt)
# print
cuminc_res.print
# plot using matplotlib
_, ax = plt.subplots()
for name, group in cuminc_res.groups.items():
ax.plot(group.time, group.est, label=name)
ax.fill_between(group.time, group.low_ci, group.high_ci, alpha=0.4)
ax.set_ylim([0, 1])
ax.legend()
ax.set_title('foo bar')
plt.show()
For running the unit tests run
pytest --cov=cmprsk cmprsk/tests/
from the project root. Note: you'll need to install pytest-cov.
Current coverage
---------- coverage: platform darwin, python 3.9.7-final-0 -----------
Name Stmts Miss Cover
----------------------------------------------------
cmprsk/__init__.py 0 0 100%
cmprsk/cmprsk.py 128 22 83%
cmprsk/rpy_utils.py 44 10 77%
cmprsk/tests/__init__.py 0 0 100%
cmprsk/tests/test_cmprsk.py 30 0 100%
cmprsk/tests/test_rpy_utils.py 27 1 96%
cmprsk/tests/test_utils.py 37 0 100%
cmprsk/utils.py 23 1 96%
----------------------------------------------------
TOTAL 289 34 88%
- update version in setup.py
- rm -fr dist directory
- python setup.py sdist bdist_wheel
- twine upload dist/* --verbose