/matplotlib-challenge

Analyze pharmaceutical testing data, plot and summarize findings

Primary LanguageJupyter Notebook

matplotlib-challenge

Analyze pharmaceutical testing data, plot and summarize findings

Background

Pymaceuticals Inc. is screening for potential treatments for squamous cell carcinoma (SCC), a commonly occurring form of skin cancer. In its latest study, 249 mice who were identified with SCC tumors received treatment with a range of drug regimens. Over the course of 45 days, tumor development was observed and measured. The purpose of this study was to compare the performance of Pymaceuticals’ drug of interest, Capomulin, against the other treatment regimens.

Data Description

  • 9 treament drugs and 1 placebo were test in 249 mice over 45 days.
  • One mouse was removed from the data due to duplicate measurements.
  • In the final data set of 248 mice, the ratio of male to female mice was nearly even at 50.4% vs 49.6%, respectively.
  • The final number of observations in the study totaled 1,880 and the observations per drug ranged from 148 to 230, indicating robust sample sizes for each drug tested.
  • The starting measurement for each tumor was ~45 mm3 across all tested treatments.

Approach

  • Read in two csv files and merged into one dataframe
  • Called pandas functions to perform various calculations using groupby including: mean, median, variance, standard deviation, and SEM of tumor volume by drug regimen.
  • Generated bar charts showing the total number of measurements for all mice tested by drug regimen using Pandas method and pyplot method.
  • Generated pie charts showing the distribution of female versus male mice in the study using Pandas method and pyplot method.
  • Calculated the final tumor volume of each mouse across the four top drugs of interest: Capomulin, Ramicane, Infubinol, and Ceftamin.
  • Calculated the quartiles, inter-quartile range (IQR) and determined potential outliers using upper and lower bounds method of the top four drugs.
  • Generated a box plot showing the distribution of the final tumor size for each mouse tested across the top four drugs.
  • Created a line plot and a scatter plot for a select mouse treated with Capomulin depicting the tumor volume over time for the mouse.
  • For the Capomulin treatment the following were calculated:
    • The Pearson correlation coefficient between mouse weight and average tumor volume
    • The linear regression model between mouse weight and average tumor volume
    • Plot of the linear regression model on top of a scatter plot

Files Used for Data Analysis

  • Mouse_metadata.csv
  • Study_results.csv

References & Resources

https://www.machinelearningplus.com/pandas/pandas-duplicated/

https://sparkbyexamples.com/pandas/pandas-get-list-of-all-duplicate-rows/

https://sparkbyexamples.com/pandas/pandas-delete-rows-based-on-column-value/

https://www.geeksforgeeks.org/find-maximum-values-position-in-columns-and-rows-of-a-dataframe-in-pandas/

https://stackoverflow.com/questions/55423048/how-to-sort-values-in-descending-order-bar-plot-from-pandas-dataframe

https://www.w3schools.com/python/pandas/ref_df_nunique.asp

https://www.w3schools.com/python/pandas/ref_df_isin.asp