Problem with final task dataset
agrigoriev opened this issue · 1 comments
agrigoriev commented
After loading final project data
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")
training_data = pd.read_csv('../data/final_project/training.csv')
test_data = pd.read_csv('../data/final_project/test_features.csv')
y_test = pd.read_csv('../data/final_project/y_test.csv')
print('The shape of the training dataset is:', training_data.shape)
print('The shape of the test dataset is:', test_data.shape)
print('The shape of the y_test is:', y_test.shape)
The shape of the training dataset is: (71538, 13)
The shape of the test dataset is: (23846, 12)
The shape of the y_test is: (23845, 1)
The number of samples for test features differs from y_test.
Is it correct?
cemsaz commented
Hi @agrigoriev. Thanks for going over the final project.
It looks like you skipped the first row of the file. If you read y_test
like this, it puts that back:
y_test = pd.read_csv('../data/final_project/y_test.csv', header=None)
y_test.shape
is (23846, 1)
This is not a problem with the training.csv
and test_features.csv
as both have a header row.