Tumor Diagnosis With Seaborn

Exploratory data analysis with seaborn

For this project I used Seaborn to discover and explore the relationships in the Breast Cancer Wisconsin (Diagnostic) data set. Below are the tasks copmleted for this project:

Task 1: Introduction and Importing the Data

Import the data into a data frame using Pandas. We can also look at all the columns using data.columns.

Task 2: Separate Target from Features

Seperate target features into a new dataframe.

Task 3: Diagnosis Distribution Visualization

Let's use seaborn countplot() to look at the diagnosis distriubition between benign and malignant tumors.

We can use dataframe.describe() to get a summary of our data. dataframe.describe() provides descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

Task 4: Visualizing Standardized Data with Seaborn

Use violin plot to visualize standarized data. We can look at the separation between the median values to determine whether this would be a good feature to use for classification.

Task 5 : Violin Plots and Box Plots

A boxplot is useful for detecting outliers amont the features in the data set.

Task 6: Using Joint Plots for Feature Comparison

This joint plot is an effective way of visualizing correlation between two feartures in a data set.

Task 7: Observing the Distribution of Values and their Variance with Swarm Plots

Swarm plot does a really good job of showing which features may have better predictive power than others.

Task 8: Observing all Pairwise Correlations

A correlation plot shows relatiopnship between all features in the dataset within one table. The color coded results enable anlaysts to quickly identify where strong correlation exists.

Note: I completed this project on Coursera

Course Certificate

ntoscano01/tumor_diag_seaborn