pip3 install -r requirements.txt
My own EDA, might be chaotic
Train / Evaluate / Visualise predictions made by:
NNLS
Lasso
Random Forest
DummyRegressor
prepare_data()
method
Use it to prepare the data:
- remove duplicates (there are none btw)
Learning_Disabilities
gymnastics into 1 hot encoding- complete
na
values with median for numeric dtypes - complete
na
values with 'unknown' string for string dtypes - one hot encode string columns ( caterogical )
- Todo: I can probably remove redundant variables(columns), that are strongly correlated among others.
- Didnt have much time to deep dive
select_highly_correlated_columns(df: DataFrame, how_many: Int = 10)
method
I select the 12 most correlated features against the Final_Grade
column
train_test_split(
df: pd.DataFrame, target_column_name: str=None, test_size=0.15, random_state=1<<15, **kv
)
Create a train / test split
train_estimator(
model=None,
**kv
):
Training helper
train_lasso()
train_NNLS()
train_random_forest()
train_baseline_dummy()
NNLS
Linear regressor with positive coefficients constraint. Implemeted as a sklearn
BaseEstimator
some utils for visualizations