/Haberman

Primary LanguageJupyter Notebook

Haberman

  1. Younger age population in the sample is more than older age population, as per Kernel Density Distribution in Section 3 Distribution Plots. Two bumps, a global maximum and an almost flat local minimum, indicate that most of the age groups are concentrated around the older and younger middle ages, respectively.

  2. Most people have less than 5 auxiliary nodes, also inferred from Section 3 Distribution Plots. Also, from the same section, it is found that most people also survive after removal of nodes.

  3. The hexagonal density plot in Section 4.1 Viewing Density in Age VS Nodes indicates that: Overall, all age groups most had less than 10 nodes. Between 50 and 60, the number of nodes increased..

  4. From Section 6.1 Strip Plot, correlation between Age VS Nodes shows: It is mostly the middle ages that go through such a procedure. Extremely young or old are rare. However, correlation is seen as slightly strong, as observed from colouration scattering. In order to understand if outliers are decreasing correlation, a box plot is drawn in Section 6.2 Box Plot, the observation of which is: The outliers indicate a combination of factors are involved in a non-linear relationship, instead of a simple node and age relationship.

  5. According to Section 6.2 Relational Plot of Age VS Nodes with Colouration of Survival Status: Early detection increases survival, at or preferably before early 40s.

  6. Looking at heat map in Section 7 Again, Looking at Whole Dataset, survival and node have medium correlation. This can be explained in two ways:

a. Small dataset

b. Medical intervention has positive correlation with survival, but damage induced by node has negative correlation with survival. Hence, the relationship between survival and node is not a straight-forward two-dimensional relationship.

NOTE: For now, no strong demarkation has been observed for linear classification, due to limited factors included as columns in dataset. The demarkations found are slightly strong or medium correlation.