Student-Percentage-based-on-study-hours: A Jupyter Notebook repository from MEDHAT-ALHADDAD

1-Importing Data And Initial Exploring

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

# importing data and exploring
df = pd.read_csv('./student_scores.csv')
print(df.shape)
print(df.describe())
df.head()

(25, 2)
           Hours     Scores
count  25.000000  25.000000
mean    5.012000  51.480000
std     2.525094  25.286887
min     1.100000  17.000000
25%     2.700000  30.000000
50%     4.800000  47.000000
75%     7.400000  75.000000
max     9.200000  95.000000

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Hours	Scores
0	2.5	21
1	5.1	47
2	3.2	27
3	8.5	75
4	3.5	30

# Plotting variables and checking correlation
df.plot(
    x='Hours',
    y='Scores',
    style='x',
    xlabel='Score %',
    ylabel='hours studied',
    title='Hours vs Percentage'
)

print(df.corr())

           Hours    Scores
Hours   1.000000  0.976191
Scores  0.976191  1.000000

There is a strong positive correlation of 97.6% between the 2 variables

2- Creating Train and Test Dastasets

X = df.iloc[:,:-1].values
y = df.iloc[:,-1].values

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error as MSE

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42
)

model = LinearRegression()
model.fit(X_train, y_train)

LinearRegression()

# ploting Regression line and test data

plt.scatter(X,y)
plt.plot(X, ((model.coef_ * X) + model.intercept_)) # y=mx+c

[<matplotlib.lines.Line2D at 0x7fd4e911c730>]

3-Predictions

y_pred = model.predict(X_test)

pd.DataFrame({'Actual':y_test, 'predicted':y_pred})

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Actual	predicted
0	81	83.188141
1	30	27.032088
2	21	27.032088
3	76	69.633232
4	62	59.951153

MSE(y_test, y_pred)

3.9207511902099244

4- Predicting Score For 9.25 Hours

prediction = model.predict(np.array(9.25).reshape(1,-1))

print('for studying for 9.25 hours it is predicted to score: ' + str(round(*prediction,2)) + '%')

for studying for 9.25 hours it is predicted to score: 92.39%

MEDHAT-ALHADDAD/Student-Percentage-based-on-study-hours

1-Importing Data And Initial Exploring

2- Creating Train and Test Dastasets

3-Predictions

4- Predicting Score For 9.25 Hours