/phenotype-cover

A package for biomarker selection based on multiset multicover and the cross-entropy-method.

Primary LanguagePythonMIT LicenseMIT

This repository provides two algorithms for the phenotype cover (PC) biomarker selection problem introduced in the paper: "Multiset multicover methods for discriminative marker selection". GreedyPC is based on the extended greedy algorithm to set cover, and CEM-PC is based on the cross-entropy-method.

Install via

pip install multiset-multicover
pip install phenotype-cover

Other packages that phenotype-cover depends on are numpy, matplotlib, and scikit-learn.

Import GreedyPC or CEMPC from phenotype_cover.

Example

>>> from phenotype_cover import GreedyPC
>>> from sklearn.datasets import make_classification
>>> # You may need to log-transform X if working with raw counts
>>> X, y = make_classification(1000, 200, n_informative=5, n_classes=5, scale=100)
>>> gpc = GreedyPC()
>>> gpc.fit(X, y)
>>> features = gpc.select(100)  # coverage of 100

Some other functionality implemented in GreedyPC

>>> # Number of elements reamining and coverage attained after every iteration
>>> gpc.plot_progress()
>>> gpc.n_elements_remaining_per_iter_
>>> gpc.coverage_per_iter_
>>> # Heatmap of the coverage provided by some feature i
>>> gpc.feature_coverage(i)
>>> # Maximum possible coverage for evey class pair
>>> gpc.max_coverage()
>>> # Pairs that could not be covered to the desired `coverage`
>>> gpc.pairs_with_incomplete_cover_