Test data missing labels

Question

Test data missing labels

Closed this issue 5 years ago · 3 comments

The data/test.csv file is missing labels so we cannot verify the model's test accuracy.

Answer 1 · 2020-07-01T17:34:32.000Z

Hi Alice,
After training the model when it goes to production you won't have labels to test accuracy on testing data. purpose of testing data here is little different than the conventional pipeline. I have already found the metrics on validation set.
So, you can see original training set is divided into two small subsets real training set + validation set/ testing set ( for DEV)
and original testing set as real production level testing data.

Answer 2 · 2020-07-01T19:51:59.000Z

Very good points! Although I think for this exercise, it may be useful for the students to compare the performance of the model against unseen data, e.g. to check for overfitting, etc.

Answer 3 · 2020-07-01T19:57:19.000Z

Yup, I think that would be easier for the students to digest. Can you trim the training data in training and testing in 80:20 ratio.
command is
This is for training data:
df.sample(frac=0.8)
For testing data:
df.sample(frac=0.2)
and then we can delete the original testing data from the repository.