Model size and test speed for help

Sorry to disturb you again. As I use the script "run_distributed_on_disk_a6k5_AdamW_Curicullum_Large_assistant_teacher_num_3_aa.sh" offered to train, I get the model size is 3715.30M and the pretrained Bnext_large model size is 1246.96M. Is there something wrong with me, can you help me ?Forthermore, the table of paper say that the BNext-L param is 106.1M, what is the matter.

There is other problem ： how can I test the quant model speed on the cpu. can you give me same advices?
Thank you so much！

Dear Shan Yang,

I will try to answer the second question. You cannot easily test the speed using existing open-source software toolkit. But we are working on it. We plan to support for both CPU and GPU hardware, and please stay tuned.

Hi Shan Yang,

For your first question: you can check the code here (

BNext/src/train_assistant_group_amp.py

Lines 648 to 662 in dfcf347

    
           if not args.multiprocessing_distributed or args.local_rank == 0: 
        
               save_checkpoint({ 
        
                   'epoch': epoch, 
        
                   'train_loss': training_loss, 
        
                   'train_top1': training_top1,  
        
                   'train_top5': training_top5, 
        
                   'test_loss': testing_loss, 
        
                   'test_top1': testing_top1, 
        
                   'test_top5': testing_top5, 
        
                   'state_dict': model_student.state_dict(), 
        
                   'best_top1_acc': best_top1_acc, 
        
                   'optimizer' : optimizer.state_dict(), 
        
                   'temp': training_temperature, 
        
                   'alpha': alpha, 
        
                   }, is_best, args.save + "_" + "{}_optimizer_{}_mixup_{}_cutmix_{}_aug_repeats_{}_KD_{}_assistant_{}_{}_HK_{}_{}_aa_{}__elm_{}_recoup_{}_{}_amp".format(args.model, args.optimizer, args.mixup, args.cutmix, args.aug_repeats, args.teacher_num, args.assistant_teacher_num, args.weak_teacher, args.hard_knowledge, args.hard_knowledge_grains, args.aa, args.elm_attention, args.infor_recoupling, args.gpu, args.epochs), epoch = epoch)

), we save not only the model state_dict, but also optimizer state_dict and the training procedure information, which explain why the checkpoint size is way larger than model size.

For your second question: The existing model is still saved using torch.save() function, which only supports 32-bit representation. In this case, it is impossible to directly get a 106.1M BNext-L using the torch library, even though all weights in HardBinaryConv are represented as +1&-1. We plan to support a BNN-specific torch extension toolkit in the near future, please stay tuned.

Thanks for your answers!

please check the binary layers implemented in bitorch-engine.

	if not args.multiprocessing_distributed or args.local_rank == 0:
	save_checkpoint({
	'epoch': epoch,
	'train_loss': training_loss,
	'train_top1': training_top1,
	'train_top5': training_top5,
	'test_loss': testing_loss,
	'test_top1': testing_top1,
	'test_top5': testing_top5,
	'state_dict': model_student.state_dict(),
	'best_top1_acc': best_top1_acc,
	'optimizer' : optimizer.state_dict(),
	'temp': training_temperature,
	'alpha': alpha,
	}, is_best, args.save + "_" + "{}_optimizer_{}_mixup_{}_cutmix_{}_aug_repeats_{}_KD_{}_assistant_{}_{}_HK_{}_{}_aa_{}__elm_{}_recoup_{}_{}_amp".format(args.model, args.optimizer, args.mixup, args.cutmix, args.aug_repeats, args.teacher_num, args.assistant_teacher_num, args.weak_teacher, args.hard_knowledge, args.hard_knowledge_grains, args.aa, args.elm_attention, args.infor_recoupling, args.gpu, args.epochs), epoch = epoch)