Data Science: Logistic Regression
:( refer to this
This project explores Machine Learning by implementing a linear classification model, as a continuation of the 42 module ft_linear_regression
.
- Logistic Regression is a topic in statistics used to describe data and form a relationship between one binary dependant variable, and other independant variables.
- An example of a question that Logistic Regression aims to answer is Do body weight, calorie intake, fat intake, and age have an influence on the probability of having a heart attack (yes vs. no)?
- Sometimes logistic regressions are difficult to interpret; tools exist easily allows you to conduct the analysis, then in plain English interprets the output.
This project aims to implement a linear classification model, as a continuation of ft_linear_regression
. Steps to do this include:
- Read a dataset
- Visualise the dataset in various ways (to create insights and develop an intuition of what your data looks like)
- Select and clean unnecessary information from your data.
- Train a logistic regression that will solve classification problem
- Ensure the algorithm chosen has a minimum precision of 98%
- Five number summary
python .\describe.py .\dataset\dataset_train.csv
- Histogram
python .\histogram.py .\dataset\dataset_train.csv
From the above, both 'Arithmancy' and 'Care of Magical Creatures' have homogenous score distributions
- Scatter Plot
python .\scatter_plot.py .\dataset\dataset_train.csv
From the above, the two feautures that are similar are 'Defense Against the Dark Arts' & 'Astronomy'. These will be the feautures selected to perform the Logistic Regression.
- Statistics
- Predictive Analysis
- Regression Modelling
- Data Visualisation