Breast Cancer Dignostic Data Analysis- This was a class assignment for Advance Analytics class. The assignment consists of analysis of breast cancer data from Wisconsin. The dataset contains 569 instances, each instance containing features obtained from a digitized image of a fine needle aspirate (FNA) of a breast mass. This study were performed in Wisconsin and the data related to this study is freely available at UCI data library. You can find related data using the link mentioned in this notebook. The features in this study corresponds to 10 characteristics of the cell nuclei present in the image such as radius, texture, perimeter, smoothness, etc. Each instance contains a feature vector of length 30 where the first 10 entrieudy are the mean of the aforementioned characteristics, the next 10 entries are the standard deviation, and the last 10 are the largest values for each characteristic. Instances also have a label where 1 corresponds to a malignant tissue and 0 is a benign tissue. The dataset is split into a training, validation, and test sets.
I have used following Python Libraries for analysis-
- Data Wrangling- Pandas
- Data visualization- Matplotlib, Seaborn, and graphviz
- Data Modeling- sklearn and scipy