Stats 506 F20 Group Project -- Group 7
Group7.Rmd
,Group.html
: the write-up for this projectmaster.csv
: the dataset downloaded from Kaggle- The source code for:
Yan Chen: Stata
Yingyi Yang: Python
Regression models tell us about the effect of predictor variables on the response variable, but to find out the effect of specific predictors on the response, we need to use linear combinations and draw inferences.
The 1985-2006 Suicide Rate data for 101 different countries and 6 age groups, found on Kaggle
Key variables:
Variable | Description |
---|---|
country |
101 unique countries |
year |
1985 ~ 2016 |
sex |
female, male |
age |
5-14 years, 15-24 years, 25-34 years, 35-54 years, 55-74 years, 75+ years |
suicides_no |
suicides count |
population |
population of each subgroup |
gdp_per_capita |
gdp per capita |
We used linear and non-linear combinations of predictor variables to draw inferences.
- Do people become less likely to suicide in recent decade?
- Do males or females tend to have higher suicide counts for the same age group?
- Are teenage females more likely to suicide than retired females?
To answer these questions, we will build a poisson regression model and explore the effect of age, gender and year in pairs on the suicide count as a linear combination.
- Are the expected suicide counts in a particular year larger than another year?
- What are the expected suicide counts when the interaction between two subgroups are considered?
To answer these questions, we have implemented delta method to compute the effect of age, gender and year on the suicide count using non-linear combinations, i.e. the ratio and product of the pair in comparison.