Predict Student Performance (PSP) from Game Play
Dataset link: https://www.kaggle.com/competitions/predict-student-performance-from-game-play/
EDA:
- Events plot for each level
Plan of implementation:
- Observed the value counts of the various events
- There are only 3 checkpoint events, only one at each of the three levels.
- Thus, aggregate the dataset to simplify the problem
- Following is the feature engineering plan:
- Get value counts for events at each level group
- Get the elapsed time at each checkpoint
- (OPTIONAL) Get the elapsed time for each level-group
- Target manipulation
- Get correctness for each of the 18 questions
- Then we would get aggregate dataset
- each data instance would represent
- session_id
- aggregate event values
- question correctness
- each data instance would represent
- Modelling idea
- Take the aggregate values
- Predict the correctness of 18 questions
- Type of problem: Multi-label Classification (MLC)