Assignment 1, kNN classifier
truehines opened this issue · 1 comments
Mahan,
I think I have found an error in the k-fold cross validation snippet of the knn jupyter notebook. In the second segment of this snippet, the line:
train_set = np.concatenate((X_train_folds[:i] + X_train_folds[i+1:]))
I believe that using the "+" operator on these two arrays (X_train_folds[:i] & X_train_folds[i+1:]) will actually add together the array elements instead of concatenating them as you intended. Do you agree with this?
In my own implementation I have the following (the reason for the if-elif-else is that concatenating an empty array gives an error):
if i == 0:
X_train_fold = np.concatenate(X_train_folds[(i + 1):num_folds])
y_train_fold = np.concatenate(y_train_folds[(i + 1):num_folds])
elif i == (num_folds - 1):
X_train_fold = np.concatenate(X_train_folds[0:i])
y_train_fold = np.concatenate(y_train_folds[0:i])
else:
X_train_fold = np.concatenate((np.concatenate(X_train_folds[0:i]), np.concatenate(X_train_folds[(i + 1):num_folds])))
y_train_fold = np.concatenate((np.concatenate(y_train_folds[0:i]), np.concatenate(y_train_folds[(i + 1):num_folds])))
classifier.train(X_train_fold, y_train_fold)
I am open to suggestions on a cleaner way to implement this...
Your feedback is greatly appreciated -- I don't have someone to discuss this type of thing with....
Regards,
True
Hey,
Sorry for the delayed response.
When slicing a list, i.e. reading of an array using the :
operator for the index, another list is returned. Try this:
myList = [0, 1, 2, 3]
myList[:2] # gives [0, 1]
myList[:1] # gives [0]
myList[:0] # gives []
Also, the +
operator when applied on lists, acts as a concatenator. Try this:
myList[:2] + myList[:2] # gives [0, 1, 0, 1]
The numpy concatenate function then takes a list of numpy arrays as the input and stacks them up into a single array along the first axis, which is treated as the new training/test set. Also, notice how deliberately I ommit a fold by indexing. So I guess this is the cleanest way to go. Correct me if I'm wrong.
Best,
Mahan