An analysis of prescription drugs.
Medicare is a national health insurance program in the United States, begun in 1966 under the Social Security Administration and now administered by the Centers for Medicare and Medicaid Services (CMS).
The Centers for Medicare & Medicaid Services (CMS), is a federal agency within the United States Department of Health and Human Services (HHS) that administers the Medicare program and works in partnership with state governments to administer Medicaid, the Children's Health Insurance Program (CHIP), and health insurance portability standards.
In 2014 the CMS had 1.2 billion claims for medication prescriptions.
Part D is a federally created program to help you lower the cost of your retail prescription drugs. Unlike Medicare Part A & B, you will not enroll in Part D through the Social Security office. Instead, you will select one of the Part D plans available in your county from private insurance carriers.
By signing up for that plan, you will have enrolled in Part D.
Medicare drug plans are optional. You’ll have a monthly premium that you will pay to the insurance carrier. In return, they give you significantly lower copays on your medicines than you would pay if you had no Part D insurance.
The majority of my work experience is in health care. I am facisnated with the complexity of the industry. This is a business of saving lives. Is there any more valuable service? I decided to focus on presciption drugs because I could view the treatments and diseases simultaneously. (ie the treatment indicates the disease)
- Discover what effects drug prescription claims
- Discover if population demographics are a factor
- Gender
- Race
- Age
- Discover if geographical location is a factor
- Coastal vs South
- Population density
Google Cloud Platform: Medications Table
Census.gov: Population Table
Census.gov: Demographic Table
#Import all neccessary packages and modules
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy as sp
import scipy.stats as stats
plt.style.use('ggplot')
#Gives credentials for api access to GCP
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]='/Users/phillliprashaad/Google_API_Key.json'
#Connects to medicare data set
import bq_helper
from bq_helper import BigQueryHelper
medicare = bq_helper.BigQueryHelper(active_project="bigquery-public-data",
dataset_name="cms_medicare")
bq_assistant = BigQueryHelper("bigquery-public-data", "cms_medicare")
- 23 Tables
- 200+ Features
- 9.5 million datapoints
- Features of interest: State, Drug Name, Total Claims
- 2 tables
- 70+ Features
- 5k+ datapoints
- Features of interest: State, Estimated Population Size, Gender, Race
#Import population data into dataframe
population_df = pd.read_csv('/Users/phillliprashaad/Desktop/Galvanzie_DSI/Capstone_1/Health Care/import_tables/region_pop_2.csv')
#Import demographics table
demographic_df = pd.read_csv('/Users/phillliprashaad/Desktop/Galvanzie_DSI/Capstone_1/Where_Are_The_Drugs/import_tables/demographic_by_state.csv')
def left_df_merge(df1, df2, col1, col2 ):
'''Helps left merge data frames
df1: Left Dataframe
df2: Right Dataframe
col1: Left column to merge on
col2: Right column to merge on
Returns: Merged Dataframe
'''
df_merge = df1.merge(df2, how='left',
left_on=col1,
right_on=col2 )
return df_merge ```
```python
def filter_DF(df, colname, string):
'''Searches a specific column to see if it contains a given value
df: The dataframe to search
colname: Name of column to search
string: String to search
Returns: Filtered Dataframe
'''
mask = df[colname].str.contains(string)
return df[mask]
- Define Null and Alternative Hypotheses
- State Alpha
- Calculate Degrees of Freedom
- State Decision Rule
- Calculate Test Statistic
- State Results
- State Conclusion
- There is NO significant difference in claims percentage between the drugs LEVOTHYROXINE SODIUM and LISINOPRIL
- Alpha = .01
- n/a
- If pval < alpha, then REJECT Null
- pval = .0007
- pval = .0007 < .01; REJECT NULL
- With a 99% CI There is a significant difference
Due to time constraints I was unable to dive deeper into this data. Different people have different cultures and habits which lead lead to different health levels. Also, certain environments effect the human body differently. Not to mention certain groups tend to cluster in different areas for various reasons. If I had more time, then I would like to look at race and geographical location as variables for drug prescription type and use.