
Unable to train the model

I am getting an error when i run python model.py. May i ask you what could be the problem?

Please pull the newest version of the code! We have updated the code!

Where i can find that?

I have fixed that issue,
Now I am getting another issue.
Can you please have a look at it?
Do you use your own videos?
This should be the expected output if you use my data.

===============DefakeHop Prediction===============
===============MultiChannelWiseSaab Transformation===============
Input shape: (2708, 32, 32, 3)
Output shape: (2708, 15, 15, 13)

Your output looks like:

The final result was like:

I used CelebDF-V1 dataset.
I preprocessed the dataset, and create .npz files as per the requirements of the model.
But I got this error.
Here is the data for Celeb-DF-v1! I follow the same code and get this data!
And please update the model.py, I think there was a small error due to the change of the structure of this repo!

Please let me know whether you can get the results or not! Thank you!

Well, I already modified the mpdel.py, thanks.
But I got the same error even on your data.
Please have a look at it.

Could you help me check this part of code in model.py?
test_images should be a 4D numpy array!

    for region in model.regions:
        path = 'data/' + region + '.test.npz'
        data = np.load(path)
        test_labels = data['labels']
        test_images = data['images']
        test_names = data['names']
        model.predict_region(region, test_images, test_names)

Sure, you may have a look at it.
Line169 and 170 are wrong
And please load the testing data by np.load since we made some changes for the structure of this repo

Finally solved the issue.

But still have to run some other datasets and I am unable to do that due to less memory of the system.
Hey,I extract feature data(celeb-v2) with shape (761334,32,32,3), it will out of memory when i run model.py
So how do you run the large dataset?

I am having the same issue and I asked one of the author of the paper and she said I can divide the data in chunks and train the model. But I am trying to arrange the resources for me, as I believe that it will effect the performance of the model and our results wouldn't be comparable to this work or others due to different circumstances.

Hi! Thank you for your questions! It is a really great question! I will update the code for this problem! The idea is that we subsample the dataset and we use subset to train the Saab transform. For prediction, the whole dataset is used by dividing the dataset to many chunks and predict each chunks one by one! For Saab transform, we could use a subset to get the kernels which is demonstrated that it will get similar kernel when the number of samples is large. For XGBoost, we still use all the samples to train! I will update the code as soon as possible!

Solved! Please try the new saab.py!

I tried to run it but this time the issue seems different.

Did you update the saab.py? Could you show me the fit in saab.py?

Thank you!The new saab.py work.
But this time the issue is xgboost,it still need much memory.
clf = XGBClassifier(max_depth=1, tree_method='gpu_hist', objective='binary:logistic', eval_metric='auc',
Change gpu_hist to hist


Thank you for your response! But now I hava the same issue as @wasim004
I updata saab.py and only modify file path.
It will be killed at right eye region with shape(134069,32,32,3)

def fit(self, images, max_images=10000, max_patches=1000000, seed=777):

Could you check which line in fit you program get killed?

def fit(self, images, max_images=10000, max_patches=1000000, seed=777):

Could you check which line in fit you program get killed?

I updated saab.py but still getting same error.
Hey,i guess the issue is still about out of memory.I entered three shapes:

  1. (100000,32,32,3): It work!
  2. (134069,32,32,3): Killed.
  3. (761334,32,32,3): MemoryError: Unable to allocate array with shape (761334,32,32,3).
    The error in saab.py at output = np.zeros((N, H, W, n_channels), dtype="float64")

Please change all "float64" to "float32" in saab.py! And change the batch size from 50000 to 10000!

def transform(self, images, n_channels=-1, batch_size=50000):

Hi @hongshuochen,

I've finally managed to run and train the model and got the results.
I trained the model both on the original and modified code of saab.py for CelebDF-v2 dataset. I got the results below:

In your original paper the results are different than what I've got for the CelebDF-v2 dataset.
Do you've any idea what could be the possible reason?

Also I train the model on CelebDF-v1.
I have got the following results:

My Data: Frame(0.7971), Video(0.8453)
Yours Data: Frame(0.9138), Video(0.9363)

The reason for this is that you change from "float64" to "float32"
If you run with "float32" you can get the results that I get!

I did not changed anything.
I run your code as it is, because I find out a server to run your code so I don't have to modify anything in the code.

Hi @wasim004 I think you only use 2 regions, right? Please use 3 regions!
I reclone the repo and download the data from https://drive.google.com/drive/u/1/folders/1nEBe5wGPmm2G1NsR46NK8msHiCUE9f8K
I just run the Celeb-DF-v1, this is the result I get!

Still got the same results.
Can you check this line?
Your features shape should be (6065,540) instead of 360

model = Ensemble(regions=['left_eye', 'right_eye', 'mouth'], num_frames=6, verbose=True)

The actual problem is that my input shape is 5190 and your is 75242.
Also the data I get after running data.py is 500 MB in size and your is 783MB.

So, I tried to re-extract the landmarks and patches and followed the same steps but again I got the same data size and shape.
My CelebDF-V1 dataset contains Test(32(Real)+159(Fake))+Train(126(Real)+639(Fake)) videos.
My CelebDF-V2 dataset contains Test(118(Real)+1128(Fake))+Train(472(Real)+4511(Fake)) videos.

Can you please confirm your datasets sizes?

After including the mouth region the results on your data are now fine.

