hongshuochen/DefakeHop

Unable to train the model

Opened this issue · 35 comments

Hey,
It is a nice work and really appreciated.

I am getting an error when i run python model.py. May i ask you what could be the problem?
Thanks!

Screenshot from 2021-12-08 15-44-53

Please pull the newest version of the code! We have updated the code!

Thank you for the response,
Where i can find that?
Thanks!

I have fixed that issue,
Now I am getting another issue.
Can you please have a look at it?
Thanks!
Screenshot from 2021-12-10 15-56-36

Do you use your own videos?
This should be the expected output if you use my data.

==============================left_eye==============================
===============DefakeHop Prediction===============
===============MultiChannelWiseSaab Transformation===============
Hop1
Input shape: (2708, 32, 32, 3)
Output shape: (2708, 15, 15, 13)

Hi,
Thanks for your kind response.
Your output looks like:

Screenshot from 2021-12-13 17-04-01

The final result was like:

Screenshot from 2021-12-13 17-08-34

I used CelebDF-V1 dataset.
I preprocessed the dataset, and create .npz files as per the requirements of the model.
But I got this error.
Screenshot from 2021-12-10 15-56-36

Sorry for the late reply!
Here is the data for Celeb-DF-v1! I follow the same code and get this data!
And please update the model.py, I think there was a small error due to the change of the structure of this repo!
https://drive.google.com/drive/folders/1nEBe5wGPmm2G1NsR46NK8msHiCUE9f8K?usp=sharing

Please let me know whether you can get the results or not! Thank you!

Thank you so much for your kind response!

Well, I already modified the mpdel.py, thanks.
But I got the same error even on your data.
Please have a look at it.

Thanks!
Screenshot from 2021-12-16 12-56-24

Could you help me check this part of code in model.py?
test_images should be a 4D numpy array!

    for region in model.regions:
        path = 'data/' + region + '.test.npz'
        data = np.load(path)
        test_labels = data['labels']
        test_images = data['images']
        test_names = data['names']
        model.predict_region(region, test_images, test_names)

Thank you for the response..!

Sure, you may have a look at it.
Thanks!
Screenshot from 2021-12-16 17-37-59

Line169 and 170 are wrong
And please load the testing data by np.load since we made some changes for the structure of this repo

Thank You so much..!!
Finally solved the issue.

But still have to run some other datasets and I am unable to do that due to less memory of the system.
Anyway thank you for your time and help.
Will contact you if faced any other issue.

Thanks!

Hey,I extract feature data(celeb-v2) with shape (761334,32,32,3), it will out of memory when i run model.py
So how do you run the large dataset?
Thanks!

Hey,I extract feature data(celeb-v2) with shape (761334,32,32,3), it will out of memory when i run model.py So how do you run the large dataset? Thanks!

Hey,
I am having the same issue and I asked one of the author of the paper and she said I can divide the data in chunks and train the model. But I am trying to arrange the resources for me, as I believe that it will effect the performance of the model and our results wouldn't be comparable to this work or others due to different circumstances.
Thanks!

Hi! Thank you for your questions! It is a really great question! I will update the code for this problem! The idea is that we subsample the dataset and we use subset to train the Saab transform. For prediction, the whole dataset is used by dividing the dataset to many chunks and predict each chunks one by one! For Saab transform, we could use a subset to get the kernels which is demonstrated that it will get similar kernel when the number of samples is large. For XGBoost, we still use all the samples to train! I will update the code as soon as possible!

Thank you for the response!
And looking forward to your update~

Solved! Please try the new saab.py!

Hi,
I tried to run it but this time the issue seems different.
Thanks!

Screenshot from 2021-12-30 17-02-59

Solved! Please try the new saab.py!

Did you update the saab.py? Could you show me the fit in saab.py?

Thank you!The new saab.py work.
But this time the issue is xgboost,it still need much memory.
K1_D4{9 }~J8)U)~P($JN6Q

[](

clf = XGBClassifier(max_depth=1, tree_method='gpu_hist', objective='binary:logistic', eval_metric='auc',
)
Change gpu_hist to hist

[](

clf = XGBClassifier(max_depth=1, tree_method='gpu_hist', objective='binary:logistic', eval_metric='auc',

)
Change gpu_hist to hist

Thank you for your response! But now I hava the same issue as @wasim004
I updata saab.py and only modify file path.
It will be killed at right eye region with shape(134069,32,32,3)

def fit(self, images, max_images=10000, max_patches=1000000, seed=777):

Could you check which line in fit you program get killed?

def fit(self, images, max_images=10000, max_patches=1000000, seed=777):

Could you check which line in fit you program get killed?

Hi,
I updated saab.py but still getting same error.
Screenshot from 2022-01-01 16-43-05

Hey,i guess the issue is still about out of memory.I entered three shapes:

  1. (100000,32,32,3): It work!
  2. (134069,32,32,3): Killed.
  3. (761334,32,32,3): MemoryError: Unable to allocate array with shape (761334,32,32,3).
    The error in saab.py at output = np.zeros((N, H, W, n_channels), dtype="float64")

Please change all "float64" to "float32" in saab.py! And change the batch size from 50000 to 10000!

def transform(self, images, n_channels=-1, batch_size=50000):

Hi @hongshuochen,

I've finally managed to run and train the model and got the results.
I trained the model both on the original and modified code of saab.py for CelebDF-v2 dataset. I got the results below:

Screenshot from 2022-01-05 15-43-38
Screenshot from 2022-01-05 15-44-11

In your original paper the results are different than what I've got for the CelebDF-v2 dataset.
Do you've any idea what could be the possible reason?

Also I train the model on CelebDF-v1.
I have got the following results:

My Data: Frame(0.7971), Video(0.8453)
Yours Data: Frame(0.9138), Video(0.9363)

The reason for this is that you change from "float64" to "float32"
If you run with "float32" you can get the results that I get!

The reason for this is that you change from "float64" to "float32" If you run with "float32" you can get the results that I get!

Hi,
Thanks for the response.
I did not changed anything.
I run your code as it is, because I find out a server to run your code so I don't have to modify anything in the code.
Thanks!

Hi @wasim004 I think you only use 2 regions, right? Please use 3 regions!
I reclone the repo and download the data from https://drive.google.com/drive/u/1/folders/1nEBe5wGPmm2G1NsR46NK8msHiCUE9f8K
I just run the Celeb-DF-v1, this is the result I get!
image

Hi,
Still got the same results.
Screenshot from 2022-01-05 17-47-36

Can you check this line?
Your features shape should be (6065,540) instead of 360

model = Ensemble(regions=['left_eye', 'right_eye', 'mouth'], num_frames=6, verbose=True)

Hi,
The actual problem is that my input shape is 5190 and your is 75242.
Also the data I get after running data.py is 500 MB in size and your is 783MB.

So, I tried to re-extract the landmarks and patches and followed the same steps but again I got the same data size and shape.
My CelebDF-V1 dataset contains Test(32(Real)+159(Fake))+Train(126(Real)+639(Fake)) videos.
My CelebDF-V2 dataset contains Test(118(Real)+1128(Fake))+Train(472(Real)+4511(Fake)) videos.

Can you please confirm your datasets sizes?
Thanks!

Hi,
After including the mouth region the results on your data are now fine.

Screenshot from 2022-01-05 18-51-55

Thanks!