For this project I used Seaborn to discover and explore the relationships in the Breast Cancer Wisconsin (Diagnostic) data set. Below are the tasks copmleted for this project:
Import the data into a data frame using Pandas. We can also look at all the columns using data.columns.
Seperate target features into a new dataframe.
Let's use seaborn countplot() to look at the diagnosis distriubition between benign and malignant tumors.
We can use dataframe.describe() to get a summary of our data. dataframe.describe() provides descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.
Use violin plot to visualize standarized data. We can look at the separation between the median values to determine whether this would be a good feature to use for classification.
A boxplot is useful for detecting outliers amont the features in the data set.
This joint plot is an effective way of visualizing correlation between two feartures in a data set.
Swarm plot does a really good job of showing which features may have better predictive power than others.
A correlation plot shows relatiopnship between all features in the dataset within one table. The color coded results enable anlaysts to quickly identify where strong correlation exists.
Note: I completed this project on Coursera
Course Certificate