num_eval?
Hiter-Q opened this issue · 13 comments
Hello, we found that during the reproduction of your article, the "num_eval" in the distillation process is set to 1. Could you please explain why? If it's only one evaluation, how did you obtain the standard deviation (std)? In your paper, don't you mention the results averaged over five experiments?
In the CFG.py file, "num_eval" is set to 5, but during the distillation process, the configuration used is "config/xx/xxx.yaml," where all the YAML files have "num_eval" set to 1. This has caused us significant confusion in our progress. We would appreciate it if we could receive your response as soon as possible. Thank you.
To reduce the cost, we performed the evaluation only one time during the distillation (It takes a long time when the IPC is high). Then we evaluate the one with the best performance five times after the distillation.
For example, our script will evaluate the distilled dataset one time during the distillation and will save the datasets as images_best.pt if it has the best performance (compared with the distilled datasets evaluated at other epochs). After the distillation, we run distill/evaluation.py to evaluate this saved dataset 5 times to obtain the reported data.
OK,thank u very much!I currently have one experiment that performed the best. So, to evaluate it, do I just need to take out the paths for images_best, lr_best, and labels_best recorded in the logged_files of this experiment and then run evaluation.py?
Yep
When I finish running python DATM.PY as mentioned in README.py, and then proceed as you instructed by running python evaluation.py --data_dir logged_files/CIFAR10/1/ConvNet/xxxx/Normal/images_best.pt --label_dir logged_files/CIFAR10/1/ConvNet/xxxx/Normal/labels_best.pt --lr_dir logged_files/CIFAR10/1/ConvNet/xxxx/Normal/lr_best.pt, the operation completes in just three seconds. The output is as follows:
yaml
Copy code
CUDNN STATUS: True
Files already downloaded and verified
Files already downloaded and verified
Hyper-parameters:
xxxxxxxxxxxxx
Evaluation model pool: ['ConvNet']
Evaluating: ConvNet [00:00<16:21,
: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument.
(Triggered internally at /opt/conda/condabld/pytorch_1646756402876/work/aten/src/ATen/native/TensorShape.cpp:2228.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
100%|xxxxxxx| 1001/1001 [00:07<00:00, 126.34it/s]
[2024-04-09 16:19:49] Evaluate_00: epoch = 1000 train time = 8 s train loss = 0.240700 train acc = 0.1000, test acc = 0.3232
And then it ends. Why is this happening? Moreover, it seems evaluation.py does not calculate the average precision and STD, right? It only shows training loss and train acc."
emm, you need to run it five times and calculate the average precision (test acc) and the STD of it by yourself
Are you suggesting that I need to run python evaluation.py python evaluation.py python evaluation.py python evaluation.py python evaluation.py on the same .pt file? And then manually record each result to calculate avg-acc and avg-std? Wouldn't that be somewhat ridiculous? Is this really how you conduct the tests? Moreover, why does it end just three seconds after running python evaluation.py? Shouldn't it have started the evaluation?
- You can add a for loop if you prefer.
- You can check the code if you think something went wrong, any pull request is welcomed :)
Alright, thank you very much. No offense intended, just a suggestion that your README.md file could be more detailed. It's quite brief and somewhat hard to understand. I'll go back and check my code again. Thank you
I'm sorry to bother you again, but I just wanted to confirm if a single run of python evaluation.py really only takes 5 seconds? After repeatedly checking the code, I found no errors. If it's certain that the evaluation time is this short, then we can probably rule out any issues with the code and conclude that the evaluation is indeed that fast
I have some questions regarding running python DATM.py. If num_eval is set to 1, why is there a need to calculate an average? Where is num_eval=5 from CFG.py used? Can I directly set num_eval to 5 in python DATM.py to achieve the same evaluation results as you?
Yes, you can. I set it to 1 simply because it takes too much time to evaluate 5 times during the distillation when IPC is high 😬.
I've used our method, and it has significantly improved performance when IPC is 1. I took the PT data from this time, and then the evaluation ended in just seven seconds. After several repeated evaluations, why is the output accuracy consistently low, always around 45%--46%?