SilvioGiancola/SoccerNet-code

PCA code.

Closed this issue ยท 11 comments

Hi,
I saw in the paper you use PCA for reducing dimension of feature. In your code, i only saw code for feature extraction, but do not have for PCA. Can you provide me code for PCA method.
Thanks.

Hi @TienDang9966 ,
I am afraid I don't have a clean copy of my code for the PCA available to share, since the project is almost 2 years old from now. However, I recall I was using the PCA function from sklearn. Nothing fancier than estimating a normal PCA from all the features, and using that to reduce the dimensionality to 512.
Hope that help!

I have tried to perform the PCA transform of all kind of features and checked the value with the corresponding *PCA512.py file. They differ by large margin. PCA code implemented using sklearn.

import numpy as np
from sklearn.decomposition import PCA


orig_file_name = '/ssd_scratch/cvit/avijit/soccer/data/france_ligue-1/2016-2017/2016-09-20 - 22-00 Paris SG 3 - 0 Dijon/1_ResNET.npy'
orig_feature = np.load(orig_file_name, allow_pickle = True)


pca = PCA(n_components=512)
principalComponents = pca.fit_transform(orig_feature[:, :])
dest_file_name =  orig_file_name.split('.')[0] + '_PCA512.npy'

orig_pca_file = np.load(dest_file_name, allow_pickle = True)

diff = (orig_pca_file - orig_feature).sum()

They differ in large margin.

You should not extract the principal components from the features of a single video but from the features of all the videos available.

Hi Silvio, first of all, thanks a lot for your contribution. I'm trying to replicate it.

For the training of the PCA, did you use the features for 300 matches on the Train dataset?
If I stack all the features of the Train dataset, that's a (3301406, 2048)
If I try to use the sklearn's PCA on that matrix, it just crashes. And i'm using a 8 CPU, 60GB memory machine.

Did you make that or trained the PCA on a smaller set?
Thanks!

I was using the IncrementalPCA from sklearn.

Yes, just found about it yesterday! I trained with it and could manage to finish correctly, on the entire Training Dataset. The calc features are different from those stored by you, but I guess that's not a problem if I re-train the detector with them

Silvio, I have trained the IncrementalPCA with all the training dataset. The training goes perfect for the training dataset, but the precision on the validation and testing dataset just doesn't increase during training.
So I think it's related to the PCA trained. I'm training now the pca with the features for the three datasets, but I find this strange. Given there's a huge load of matches, shouldn't the PCA fit on training reach to the same results as fitting on training + testing + validation? Did you fit the PCA to the whole of the matches?

Thanks

PCA is an unsupervised method, you are actually not using any label, so in my opinion it is fine if you use training, validation and testing to estimate the PCA matrix. I might have done the same, I honestly do not remember. Also, are you normalizing/whitening the features before the PCA?

What do you mean by the precision on the validation and testing dataset just doesn't increase during training? There is no precision related to the PCA, so I guess you are talking about the final spotting task. What kind of performances are you reaching? Your validation set should at least increase, else there is a problem with your code or data.

What I mean is that the training accuracy gets very good values during training, but remains all the same on validation.
For example, on epoch 60:

Training:
[[23705 35 52 23]
[ 11 1175 8 4]
[ 10 31 1432 1]
[ 3 20 15 888]]
Loss: 0.668 Accuracy: 0.963 mAP: 0.991

Validation 0:
[[7609 172 166 139]
[ 98 190 7 2]
[ 109 23 308 4]
[ 80 4 8 201]]
auc: 0.654 (auc_PR_0: 0.978 auc_PR_1: 0.587 auc_PR_2: 0.727 auc_PR_3: 0.647)
Loss: 1.43e+02 Accuracy: 0.666 mAP: 0.654

And same bad accuracy for testing.
I wasn't doing anything before PCA, i will try adding both scaling and whitening, thanks for noting it!

Why is that bad? There is a discrepancy between training and validation, probably indicating overfitting. See the SoccerNet paper those numbers are in the same order of magnitude I reported in tables 3, 4 and 5 (assuming those numbers are for classification).

My bad, it's fine then. I don't why I thought the values should been higher.

Thanks a lot for your replies!