Test data missing labels
Closed this issue · 3 comments
The data/test.csv
file is missing labels so we cannot verify the model's test accuracy.
Hi Alice,
After training the model when it goes to production you won't have labels to test accuracy on testing data. purpose of testing data here is little different than the conventional pipeline. I have already found the metrics on validation set.
So, you can see original training set is divided into two small subsets real training set + validation set/ testing set ( for DEV)
and original testing set as real production level testing data.
Very good points! Although I think for this exercise, it may be useful for the students to compare the performance of the model against unseen data, e.g. to check for overfitting, etc.
Yup, I think that would be easier for the students to digest. Can you trim the training data in training and testing in 80:20 ratio.
command is
This is for training data:
df.sample(frac=0.8)
For testing data:
df.sample(frac=0.2)
and then we can delete the original testing data from the repository.