/-Freshmen-data-regression

Regression analysis is a powerful statistical method that allows you to examine the relationship between two or more variables of interest. Dataset has different columns for students and the aim of this issue to find GPA score.

Primary LanguageJupyter Notebook

Freshmen-data-regression

Predicting and understanding different key outcomes in a student’s academic trajectory such as grade point average, academic retention, and degree completion would allow targeted intervention programs in higher education. Most of the predictive models developed for those key outcomes have been based on traditional methodological approaches. However, these models assume linear relationships between variables and do not always yield accurate predictive classifications. On the other hand, the use of machine-learning approaches has been very effective in the classification of various educational outcomes, overcoming the limitations of traditional methodological approaches. Details of student admissions to colleges in the university, divided across ethnicity from different joining years with added identifiers like average GPA of batch and co-ordinates of colleges and schools.

For this study we have student's information for different ages. We have data with 8 columns.

GPA - commonly used indicator of an individual's academic achievement in school and this is our terget.

Miles from Home - Distance from home.

College - Dataset has 5 Different types of study area.

Accommodations - A dorm, off-campus and other type living place.

Years Off - Each person has various years on college.

Part-Time Work Hours - It is show that student's part time work during a week.

Attends Office Hours - It is show that student's office work during a week.

High School GPA - This column contains GPA level of a student.

A pairplot plot a pairwise relationships in a dataset. The pairplot function creates a grid of Axes such that each variable in data will by shared in the y-axis across a single row and in the x-axis across a single column. That creates plots as shown above.

Corr

In our dataset we do not have misssing value. There are some object type and l used dummy variable. Visualize correlation of our dataset, correlation matrix does not have high value correlation. That is why keep all features.

Corr

Finally

I have used 3 regression metrics to estimate min error of freshmen Dataset. Linear model used and measured and visualized the performance of the models.

We see mean absolute and Squared Error (0.750) is close eah other (0.756).

R square is 0.294.