Intro

Install

pip3 install -r requirements.txt

My own EDA, might be chaotic

Train / Evaluate / Visualise predictions made by:

prepare_data() method

Use it to prepare the data:

remove duplicates (there are none btw)
Learning_Disabilities gymnastics into 1 hot encoding
complete na values with median for numeric dtypes
complete na values with 'unknown' string for string dtypes
one hot encode string columns ( caterogical )
Todo: I can probably remove redundant variables(columns), that are strongly correlated among others.
Didnt have much time to deep dive

select_highly_correlated_columns(df: DataFrame, how_many: Int = 10) method

I select the 12 most correlated features against the Final_Grade column

train_test_split(
df: pd.DataFrame, target_column_name: str=None, test_size=0.15, random_state=1<<15, **kv
)

Create a train / test split

train_estimator(
        model=None,
        **kv
        ):

Training helper

train_lasso()
train_NNLS()
train_random_forest()
train_baseline_dummy()

NNLS Linear regressor with positive coefficients constraint. Implemeted as a sklearn BaseEstimator

some utils for visualizations