/wisconson_biopsy

A quick analysis of the wisconson biopsy database. This analysis determines that the pathological assessment of malignancy in tissue samples is constructed from a linear weighting of quantitative cytological features.

Primary LanguageJupyter Notebook

wisconson_biopsy

Making a disease diagnosis from a sample of tissue (Pathology) is part art/part science. It relies on quantitative metrics such as counts of scores of miss-formed cell morphology, but also on a final subjective decision by the physician. This decision may or may not be formed from a objective weighting of the quantitative metrics. To gain insight into this decision, I anylized data from the wisconsin breast cancer database and compared the conclusions of the pathologist with the results of machine learning algorithims: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original). By inspecting the feature space, and visually comparing this to the diagnosis labels, I found that the pathological assessment appears to be formed from a roughly linear combination of the recorded cytological features. To test this I built a machine learning classifier and compared its performance to the results of the pathologist. The accuracy, precision and recall of this the logistic model were all very high. To gain further insight I identify the misclassified data and suggest they be re-inspected by the pathologist to determine the traits that distinguish them.